workflowintermediate

Build a Campaign Reporting Agent for Monday-Morning Reports That Write Themselves

Set up an AI agent that pulls campaign data from your ad platforms and analytics every week, writes the narrative, flags anomalies, and delivers a report people actually read.

reportingai-agentsanalyticsautomationdashboardsanalytics leadmarketing ops managerpaid media specialistmarketing leader

Published 2026-05-15

What this workflow does

Weekly reporting is high-stakes and low-creativity: pull numbers from four platforms, paste into a deck, write "performance was up week over week" for the ninth consecutive time. This workflow builds an agent that does the pulling, the comparing, and the first-draft narrating — and, more usefully, flags what a human should actually look at. The deliverable is a short written report in Slack or email every Monday at 8am: numbers, week-over-week deltas, anomalies with hypotheses, and open questions.

Outcome: 3–5 hours of analyst time returned weekly, faster anomaly detection (the agent notices the tracking break on day 1, not at month end), and a report format consistent enough that stakeholders build a reading habit.

Prerequisites

API or connector access to your data sources: ad platforms (Meta, Google, LinkedIn), GA4 or your product analytics, and CRM for down-funnel numbers
An automation platform (n8n or Make have the broadest marketing connectors) or a scheduled script if you have engineering support
An LLM API for narrative and anomaly analysis
Agreed KPI definitions — in writing — including which source is authoritative for each number
A delivery channel and a named human reviewer

The workflow, step by step

Step 1: Settle the metric contract first (2–3 hours, one time)

Most reporting-automation projects die on definitional ambiguity, not technology. Before building anything, write a metric contract: each KPI, its exact formula, its authoritative source, and its comparison basis (WoW, vs. 4-week average, vs. target). Example: "CPL = Meta spend ÷ CRM-created leads with source=paid-social, weekly, compared to trailing 4-week mean." The agent will faithfully automate whatever ambiguity you leave here.

Step 2: Build the extraction layer

Schedule the platform pulls for Monday 6am: last full week's data per source, plus the prior 8 weeks for baselines. Normalize into one table (a Google Sheet or database table is fine): date, channel, campaign, spend, impressions, clicks, conversions, revenue/pipeline. Pull raw numbers and compute derived metrics (CPA, ROAS, CVR) in your own step — don't let each platform's dashboard math leak into the report.

Step 3: The anomaly pass

Before any narrative, run a deterministic check: for each metric per channel, is this week outside ±2 standard deviations of the trailing 8 weeks? Flag those rows. Then let the LLM hypothesize on flags only:

These metrics broke from their 8-week baseline this week: [flagged rows,
with context: budget changes, launches, holidays from the campaign log].
For each, give the 2 most likely explanations, ranked, and one check a
human could run to confirm. Distinguish between "performance changed"
and "measurement changed" explanations — always consider tracking
breakage. Do not explain unflagged metrics.

Keeping detection deterministic and interpretation LLM-based gives you the reliability of statistics with the readability of language.

Step 4: Generate the report

One synthesis call with the normalized table, flags, hypotheses, and the campaign log:

Write the weekly marketing performance report. Audience: [team/execs].
Format:
1. Headline: one sentence — the week in a nutshell
2. Scorecard table: KPI, this week, vs 4-wk avg, vs target, trend arrow
3. What needs attention: the flagged anomalies with hypotheses (max 3)
4. What's working: max 2 genuine positives with numbers
5. Open questions for the team (max 2)
Under 400 words outside the table. Never claim causation the data
doesn't support — use "coincides with," not "because of."

Step 5: Human review, then send

The named reviewer gets the draft at 7am, spends 10 minutes checking the anomaly hypotheses against what they know (the agent doesn't know sales asked to pause a campaign on Thursday), edits, and releases at 8am. Log every edit — corrections are training data for the loop.

Do not skip the review in the first quarter of operation. An auto-sent report with a wrong number costs more credibility than the workflow ever saves.

Failure modes and fixes

Numbers don't match the platform dashboards. Attribution windows and time zones, almost always. Pin every API pull to explicit windows/zones in the metric contract and footnote them in the report.
The narrative is confident nonsense. The model explains normal variance as strategy. That's why Step 3 restricts interpretation to statistically flagged rows — if fluff persists, tighten the "do not explain unflagged metrics" instruction and lower temperature.
A source API fails silently and the report ships with holes. Add a completeness gate: if any source returns empty or >20% below its usual row count, the report goes out with a visible "DATA INCOMPLETE: [source]" banner — or holds for human intervention.
Stakeholders stop reading by week eight. The report is describing, not deciding. Ruthlessly grow the "needs attention" and "open questions" sections' quality; cut anything nobody has acted on in a month.

Turning it into a loop

Three feedback channels, reviewed monthly:

Edit log: feed the reviewer's corrections back — "here are 12 edits humans made to your drafts; what systematic instruction changes would prevent them?" Update the report prompt accordingly.
Anomaly verdicts: for each flagged anomaly, record what the cause actually turned out to be. Feed confirmed patterns back into the hypothesis prompt ("in this account, Meta CPL spikes have historically been caused by X 60% of the time").
Question outcomes: track which "open questions" led to real decisions. The agent learns which kinds of questions this team finds valuable.

After a quarter, the agent's first drafts need fewer edits, its hypotheses rank the historically-likely cause first, and the human reviewer's job shrinks toward a genuine 5-minute sanity check — which is the correct end state. Not zero human review; minimal, high-leverage human review.