Commentary

Synthetic Data Is Rewriting The Rules Of Measurement

by Tasha Webb , Op-Ed Contributor, March 11, 2026

We operate in a world of increasing data privacy regulation, growing fragmentation across channels and ever-more complex media ecosystems. Against that backdrop, the demand for richer, more granular and more complete datasets to power robust marketing mix modeling (MMM) has never been higher.

MMM has always been powered by the quality of its data. If you get that right, everything else follows. The impending convergence of generative AI, large language models (LLMs) and synthetic data is going to be genuinely transformative. Not just for measurement, but for the entire data infrastructure that underpins it.

Filling the gaps with artificial intelligence

It’s estimated that between 60% to 80% of analytical work involves data wrangling: pulling data together, reconciling sources, resolving gaps etc. For MMM, that figure feels, if anything, conservative.

Clients routinely experience missing variables. Media channels may only be reported at a total level without platform-level splits. Competitor activity can be difficult to source. Hyper-local indicators such as weather patterns, local economic signals or foot traffic data are rarely available in a consistent format. Historically these limitations have forced compromises in model design and can reduce confidence in output.

This is where synthetic data will begin to play an important role. Synthetic data is artificially generated data that reproduces the statistical characteristics and structure of real datasets. It is not random dummy data. When generated responsibly and trained on robust benchmarks, proprietary datasets or anonymized patterns across industries, it can help fill gaps where information is incomplete or unavailable.

Although we’re not quite there yet, synthetic augmentation will increasingly allow us to construct statistically sound datasets that mirror real-world behavior and enable more comprehensive modeling than would otherwise be possible. Over time, responsibly developed norms databases built from these techniques could provide stronger foundations for measurement across industries.

Scenario planning at real scale

Since the recent wave of privacy regulation, the industry has lost access to some of its most informative data sources. Sharing customer transaction data, CRM records and granular audience segments with external partners has become legally and reputationally complex, and rightly so.

Synthetic data offers a compelling solution. Organizations can generate statistically representative synthetic versions of their own proprietary data and share those with measurement partners, preserving privacy while restoring the analytical depth that’s now being missed. The ability to examine customer transaction patterns or niche audience segments without exposing underlying personal data is, frankly, exciting. It opens doors that regulation has, understandably, closed.

Scenario planning has always been a strength of well-executed MMM. However, building hypothetical datasets to stress-test model robustness or explore future media strategy has historically been time-consuming and constrained by human bandwidth.

With synthetic data, we will be able to generate thousands of plausible future market conditions. That means testing the resilience of our models, pressure-testing a client’s data strategy, and exploring the implications of different media investment scenarios at a depth and speed that simply was not achievable before. This capability will only sharpen as the tools mature, and I expect it to become a defining feature of MMM practice within the next two to three years.

From unstructured information to modeling signals

Beyond synthetic data, large language models (LLMs), deep learning models trained on massive text datasets, are unlocking a different kind of opportunity. LLMs can make the vast universe of unstructured data legible to MMM.

The majority of MMM has always relied on structured data that’s clean, labeled and tabular. But enormous quantities of contextually rich information exist in text documents, news sentiment, industry reports, internal survey results and qualitative research outputs that have never been systematically incorporated into models.

LLMs can increasingly help convert that unstructured information into structured signals that models can incorporate. Being able to systematically account for sentiment shifts, news cycles or broader market narratives alongside traditional variables could add valuable context and explanatory power to measurement models.

Agentic AI workflows, which go beyond static predictions to execute multi-step plans and decision-making, will also reduce the time it takes to get from messy, fragmented data streams to model-ready inputs. It should help to accelerate data collection, validation and cleaning. Longer term it’s expected to deliver self-governing, self-healing data pipelines, able to identify anomalies, predict breakages and harmonize disparate sources automatically, ultimately becoming standard infrastructure.

Speed, trust and the human element

None of this, however, changes the fundamental requirement for human expertise and authentic partnership. Synthetic data is an augmentation, not a replacement for real data. Models built on biased or poorly governed synthetic inputs will produce poor outputs, and no amount of computational sophistication changes that. The human role remains in identifying the right model, interpreting outputs in context and ensuring that the assumptions embedded in any AI-assisted process are sound. That role becomes more important, not less, as these AI powered tools proliferate.

Filling data gaps intelligently, unlocking first-party data safely, sharpening scenario planning and transforming unstructured information into modeling-ready insight are realities that will soon be available to marketers. Yet authenticity, transparency and trust will remain the non-negotiables of good marketing prediction and measurement practices.

The tools are evolving quickly, but the fundamentals have not changed: synthetic data augments, it does not replace; AI accelerates, it does not decide. The organizations that will lead in this space will combine genuine technical capability with the econometrics and measurement expertise to deploy it well, alongside partners they can trust to tell them the difference.

2 comments about "Synthetic Data Is Rewriting The Rules Of Measurement".

Check to receive email when comments are posted.

John Grono from GAP Research, March 12, 2026 at 5:42 p.m.
Aaah ... then that means that every person buying a lottery ticket that they AI'd will become millionaires.
Reply

Tony Jarvis from Olympic Media Consultancy, March 12, 2026 at 6:03 p.m.

Bring it on, John!