How to evaluate your first offline marketing campaign: 5 methodologies.

Jun 04, 2025

CMOs, let's talk about something that might feel like a leap into the unknown: how to evaluate your first offline marketing campaign.

The goal isn't always perfect attribution, but directional correctness and actionable insights to decide: "Was this worth it, and should we do it again (or differently)?"

Before You Even Think about the Ad CTA: The Crucial Pre-Work

Rushing into an offline campaign without a measurement plan is like launching a new feature without analytics. Don't do it.

Define Crystal-Clear Objectives & KPIs:
- What do you actually want to achieve?
  - Direct Sales Lift: Increase in purchases, subscriptions.
  - Lead Generation: More sign-ups, demo requests.
  - Website Traffic: Uplift in direct or branded search traffic.
- Your KPIs will dictate your measurement methods. "Increased brand buzz" is vague; "15% increase in direct website traffic from [Target City]" is measurable.
Establish Your Baseline:
- You can't measure a lift if you don't know where to compare.
- Your data availability will likely determine the measurement methodology. You will need at least 4-8 weeks and up to 2 years before the campaign launches on your key metrics.
The Power of Power Calculation (Don't Skip This!):
- What it is: Statistical power is the probability that your test will detect an effect if there actually is an effect. Low power means you might run a successful campaign but your measurement won't be sensitive enough to pick it up, leading you to (wrongly) conclude it failed.
- Why it matters for startups: You have limited budget. You need to know if your campaign investment (and measurement setup) can realistically detect the minimum effect size that would make the campaign worthwhile.
- Simplified: Let's say your baseline daily sales are 100 units. You estimate your campaign might bring a 10% lift (10 extra units). A power calculation helps determine if, given your normal daily sales variation, a 10-unit increase is statistically distinguishable from random noise over your campaign period. If you need a 50% lift to be sure, and you only expect 10%, your test is underpowered.
- What to do:
  - Determine your baseline conversion rate/metric.
  - Define your Minimum Detectable Effect (MDE): What's the smallest lift you'd care about?
  - Use an online sample size calculator (search "statistical power calculator for A/B testing" – principles are similar). You'll input baseline, MDE, desired power (typically 80%), and significance level (typically 5%).
  - This helps you understand if your campaign scale or measurement duration is sufficient. You might realize you need to run the campaign longer, with higher pressure or target a larger/smaller area.

The Offline Measurement Toolbox: Pros, Cons & "Good Enough" for Startups

Unfortunately for offline A/B tests are almost impossible to perform. Here are the methodologies I use, ranging from simple to more complex. For a first campaign, you'll likely combine 1-2 of the simpler ones.

1. Direct Response

How it works:
- For TV and Radio: 5 to 8 minutes window after the spot is aired you should see a clear spike on your upper KPIs such visits or installs.
- You embed a trackable element directly into your offline ad: Promo Codes/Discount Codes, Unique Phone Numbers or Vanity URLs/QR Codes.
Pros:
- Direct Attribution: Relatively clear link between ad and action.
- Fast for iteration: The same day you launch, you can asses the impact.
- Cost-Effective: Easy and cheap to implement.
- Clear CAC: Easy to calculate cost per acquisition for users of the code/URL.
Cons:
- Underreporting: Not everyone will use the code/URL, even if the ad influenced them (the "halo effect"). People might see the ad, forget the code, and Google you later.
- Friction: Requires an extra step from the user.
- Limited Scope: Best for campaigns aiming for immediate, trackable action. Less useful for pure brand awareness.
"Good Enough" When:
- Your primary goal is immediate sales/sign-ups.
- You're testing different offline channels (e.g., one code for radio, another for a local paper), formats or messages relatively to each other.
- You accept it's a conservative measure of impact but gives you a solid baseline.

2. Time Series Analysis (post vs pre, with a grain of salt)

How it works: You track your KPIs (website traffic, sales, social mentions) before, during, and after the campaign. Look for significant uplifts that correlate with campaign activity.
Pros:
- Simple & Accessible: Uses data you likely already have.
- Good for Spotting Trends: Can show clear spikes if the impact is large enough.
- If you have very good data for more than 2 years, you can run time series models such prophet from meta to have a more robust baseline.
Cons:
- Is the least robust methodology from a statistical pov.
- Correlation, Not Causation: Many other factors (seasonality, competitor actions, PR, even viral social posts) could cause the lift. It's hard to isolate the campaign's effect.
- Noise: Normal fluctuations can obscure smaller campaign impacts.
"Good Enough" When:
- You're running a short, high-impact campaign in a relatively stable market.
- Dont have strong seasonality.
- Used as a supporting metric alongside others (e.g., promo codes show some direct lift, and time series shows a corresponding overall bump).
- You can look for "unexplained" lift after accounting for other known marketing activities.

3. Geographic Lift Analysis (Geo-Targeted A/B Testing)

How it works: You run your offline campaign in one or more "test" geographic areas (e.g., specific cities, DMAs) while leaving similar "control" areas untouched. Then, you compare the change in your KPIs (sales, website traffic from those geos, etc.) between the test and control areas during the campaign period versus the baseline period.
Pros:
- More Robust: Closer to a true A/B test, can provide stronger evidence of causality if you set it correctly (see wayfair blog).
- Measures Overall Impact: Captures both direct response and halo effects within the test region.
- Versatile: Can be used for various KPIs.
Cons:
- Complexity & Cost: Requires careful selection of statistically similar test/control markets. The campaign itself might be more expensive if you need specific geo-targeting (e.g., certain radio stations, local print).
- Data Challenges: Need reliable geo-tagged sales/traffic data. External factors (local events, competitor activity) can contaminate results.
- Spillover: People from control areas might be exposed to the campaign (e.g., commuters).
"Good Enough" When:
- Your primary focus is first test and then scale.
- You can target geographically.
- Your product/service has clear geographic customer distribution.
- You've done your pre-work (power analysis is key here!) to ensure markets are comparable and the expected lift is detectable.
- Pro-Tip: Use a "matched market" approach. Find two or more cities that are demographically similar and have historically tracked similarly on your KPIs.

4. Brand Lift Studies (Often for Later, But Good to Know)

How it works: Surveys conducted before, during, and/or after the campaign to measure changes in brand awareness, perception, purchase intent, ad recall, etc., typically within the target audience. Often run by third-party providers, especially for larger campaigns.
Pros:
- Measures "Softer" Metrics: Essential if brand building is a primary goal.
- Can Isolate Campaign Impact: When done rigorously (e.g., control vs. exposed groups).
- Is available in google or Meta
Cons:
- Pre work: You need to do the presurvey before campaigns. Finding provider and running the surveys can take up to 5 weeks.
- Cost: Can be expensive, often requiring specialist survey panels or providers.
"Good Enough" When:
- Your growth problem is related with branding.
- Probably not for your very first, lean offline experiment unless brand awareness is the absolute primary goal and you have the budget.
- More relevant once you're scaling successful offline channels.

5. Marketing Mix Modelling (MMM)

How it works: MMM uses statistical models (often regression-based or Bayesian models) to quantify the impact of various marketing inputs (ad spend across channels, promotions, etc.) and non-marketing factors (seasonality, economic conditions, competitor actions, even pandemics) on a key outcome, typically sales or conversions.
- You gather historical data (ideally 2-3 years) for all your marketing channel spends, impressions or GRPs (for TV/Radio) along with your sales/conversion data.
- The model then estimates the contribution of each marketing channel to the outcome, controlling for external factors.
- It can also estimate concepts like adstock (the carryover effect of advertising) and saturation points (diminishing returns).
Pros:
- Holistic View: Provides a comprehensive understanding of how different marketing levers (and external factors) drive performance.
- Strategic Insights: Helps in optimal budget allocation across channels by identifying ROI and response curves.
- Measures "Untrackable" Impact: Can capture the influence of brand-building activities and offline channels where direct attribution is hard.
- Scenario Planning: Allows you to simulate the potential impact of different budget allocations or market conditions.
Cons:
- Data Intensive: Requires significant amounts of clean, granular historical data, which can be a challenge for new startups.
- Complexity & Expertise: implementing MMMs typically requires specialized statistical knowledge. Even with the new packages from Meta, Google or PyMC Labs, it's not a quick DIY.
- Time-Consuming: Data collection, cleaning, model building, and iteration can take weeks or even months.
- Backward-Looking: Based on historical data, so it might not fully capture the impact of very new strategies or changing market dynamics.
- Correlation, Not Always Perfect Causation: While more robust than simple time series, misspecified models can still lead to spurious conclusions if not carefully built and validated.
"Good Enough" When:
- You have at least 1-2 years of consistent marketing activity and sales data across multiple channels.
- You've outgrown simpler methods and need a more strategic way to allocate a larger, more diverse marketing budget.
- You need to justify marketing spend and demonstrate ROI to stakeholders (e.g., board, investors) at a more aggregate level.
- You are looking to understand diminishing returns and optimize your channel mix for long-term growth, rather than just immediate CPA.
- You are considering or using newer, more accessible MMM tools or open-source solutions (like Meta's Robyn or Google's LightweightMMM) that lower the barrier to entry.
Pro-Tip: Start with good data hygiene from day one, even if MMM seems far off. The better your historical data, the more reliable your future MMM will be. Consider it a more advanced tool to use once you have established some marketing traction and data history.

Choosing Your Method(s):

For a startup's first offline foray, perfection can be the enemy of progress. Aim for "good enough" data to make an informed decision. The exact numbers may differ between methodologies, but the direction must be consistent across methodologies.

If your budget is very tight: Direct Response +careful Time Series Analysis is your go-to. Be very clear about the limitations.
If you have a bit more budget and can target geographically: Geo Lift Analysis is powerful. Combine it with the above.
If you have the data and the resources, after a quick channel validation test you can implement a MMM.

The Startup CMO's Mindset for Offline Measurement

Be Realistic: Acknowledge that offline attribution will never be as precise as digital and have prior expectations: run a power analysis to make sure that if you dont see the effect is because it doesnt work and not because the pressure was to low to see the effect.
Focus on Directional Accuracy: Are things trending up because of the campaign? By roughly how much?
Iterate and Learn: Your first campaign is also an experiment in measurement. What worked? What was too noisy? The objective is to force ourselves to make the decision on what to keep and what to kill in the next iteration.

Alberto’s Substack

Discussion about this post