Commentary

Reliability Vs. Validity

by Ed DeNicola , Op-Ed Contributor, May 1, 2023

Is the purpose of the new joint industry committee (JIC) about improving TV audience measurement quality or is it about gamesmanship?

Is it even possible to validly measure viewing using alternative data sources such as TV set-top boxes (STB) or smart TVs?

When TV set-top box data first became available back around 2008, there was a little company that was trying to use it to measure TV audiences. It struggled to get traction and ended up suing Nielsen for anticompetitive practices related to its staggered contracts. It lost and went belly up.

During this time, Nielsen started a new division to look at STB data to see if it could be used to make better TV ratings. It was called DigitalPlus at first and was later renamed Nielsen Advanced TV. The then Nielsen CEO tasked the group with the goal of disrupting the company. The president of client services told the team that they were all handpicked for this role.

The team did some good work until Nielsen’s new private equity owners began to cut everything not attached to the core business. The company had to pay down debt stemming from a leveraged buyout. Most of the team was let go, or if they were lucky, moved to other departments.

Much of what the DigitalPlus team learned is still true today. The TV set-top box data had good reliability and poor validity. Meaning the results were consistent; however, they weren’t right.

Sample sizes were large and that made the data very stable, but it didn’t include viewing to broadcast-only home.

The data recorded tuning even when the TV set was off and the data from the different MVPDs that provided it came in different formats.

It had to be normalized before it could be used. It was far from perfect.

Perhaps worst of all, the TV set-top box data lacked information on who in the home was doing the viewing and it had to be imputed. That issue also is true for smart TV data.

When Nielsen compared TV viewing from the STBs to its people meter panel data from the same households, the STB data came out quite a bit higher. On top of that, it was coming out higher at the same time it was missing more than 30% of viewing from TVs not attached to a STB.

It’s now 15 years later and some of these issues might have been resolved for some dataset. however, there is still old legacy equipment in the marketplace and cord-cutting has accelerated. Broadcast-only homes haven’t gone away, nor has the problem that the TV STB and smart TV data only include household information.

As a result of these shortcomings there remains the need to “tune” the STB and smart TV data.

To tune the data research companies need a source of truth with which to tune it to. If these companies don’t have a panel of their own – and if you look at the competitors to Nielsen, none have multi-million-dollar TV research panels – then how are they tuning the data?

What is their source of truth?

One can only surmise that they must be tuning their data to Nielsen. They have their own proprietary imputation models, but if not for Nielsen’s panel how would they know they’re right?

The answer to the question about whether the JIC is about TV audience measurement quality or gamesmanship is that it is a little of both.

The JIC could ask for certification changes in addition to what the Media Rating Council requires that could improve TV measurement accuracy and its gamesmanship because the industry can’t accurately measure TV without Nielsen.

Ed DeNicola is Chief Media Officer of SceneSave and is a long-time media research executive who was head of business development at Nielsen Advanced TV.

3 comments about "Reliability Vs. Validity".

Check to receive email when comments are posted.

Ed Papazian from Media Dynamics Inc, May 1, 2023 at 10:05 a.m.
Excellent article. Too many people confuse the two terms ---stability and validity. They also assume ---incorrectly---that the larger the sample the more accurate its findings will be. This is based on the barrage of misinformation we get from statisticians and researchers as well as TV news which cites the results of poll after poll and as a matter of cautioning the viewers keeps telling them which findings fall within the "error" margin---based on sample size calculations which do nothing more than estimate the odds of getting the same answer if the same study was replicated. This has nothing to do with accuracy---only stability---but almost nobody, incuding many in the ad or media business, seems to understand this distinction. Folks---you can't determine a survey's accuracy by knowing its sample size. Period.

As for TV audience measurement, what Nielsen has on a national level is a collection of panels---some with people meters; some without----which record set activity electronically. Being a panel---or collection of panels--- members drop out ---or are terminated---for a variety of reasons and even if replacements are sought in a scientific manner---or even to replace those lost by new homes with similar characteristics---it remains questinable as to whether the resulting panel truly representats the whole nation at any given point in time. But it's all we have and, in my opinion, it's far better than many of the alternatives being proposed as replacements---using big data ACR or STB panels.

Stability and sample size are another issue---but that's largely a question of cost and whether the big audiece sellers are willing to pay substantially more just so the very small audience sellers---who can't afford the added sample size---get more stable ratings. Why would they?

The real tragedy is that we have forgotten what should be the real question---"What do we mean by audience"?What we are going to get will be basiclly a digital-style "audience" measurement--based on device usage and the assumption that the "viewer"---or owner of the device---"watched" every bit of program and commercial content---so long as it remained on-screen. Which is nonsense and produces a hugely inflated picture of the ad reach as well as the frequency attained by most advertisers as well as misleading them about who was"watching" their commercials. But this is what then sellers want and as they do most of the funding while advertisers pay nothing, this is what we will get.
Reply

John Grono from GAP Research, May 1, 2023 at 8:35 p.m.

Good article by the pair of Eds.

No matter what you do, you won't have 'THE truth.   Taking AU as an example, we have compulsory voting and compulsory completion of our quinquennial population census.

We're a pretty compliant mob generally getting around 95% compliance.   But does that mean that the population census, or even the election results are 100% correct and are 'the truth'?   No it doesn't, but in all probability it does no damage to the actual truth should it be achievable.

The US MRC is basically doing the same thing ... ensuring that the ratings systems is ensuring compliance to the acceptable rules as defined by the JIC.

Ed, while I agree that defining audience to include 'in-the-room' as adequate is wrong.   But the TV rating is not 100% based on "watched every bit of the program and commercial content so long as it remained on the screen".   In AU if a 'person' (well their button on the ratings remote) has not been active within thh prescribed time-frame (and I can't recall whayt it is) they are assumed that they have left the 'TV Room' and all 'assumed viewing' is removed.   Also, when you see an 'audience' to a program it is based on average minute (here in AU) and not on 'any assumed presence at any time during the program duration'.   But I certainly do take your point.   Just maybe if advertisers contributed financially we might be able to include 'attention' metrics.

Ed Papazian from Media Dynamics Inc, May 2, 2023 at 3:28 a.m.

John, the problem with the assumption that we are getting an indication of the program "viewer" being in the room---if not actually attentive---is that we aren't getting even that. Although the panel members---or household head---are instructed to press their buttons if they "stop viewing"---which can include leaving the room-----and there are periodic on-screen prompts to remind them of this need, few actually bother. Accordingly you may get 2-3% pressing their button per commercial break as your average result when the real figure ---per TVision and other observational studies---is 30-35% for commercials.

We have to remember that an average Nielsen panel member is counted as "watching" 4.5-5 hours of TV---'linear" and streaming---per day and is in the panel for years on end. That's a lot of viewing---many thousands of occasions----and it's simply unrealistic to expect most panel members to obey instructions about telling the system everytime they leave the room, then reaffirm their "viewing" when they return. Most of the time they just don't do it.

Before TVision came to the UK, it was assumed that their people meter system was findng a loss of only 2-3% per commercial due to not being present, as this is what the rating system was reporting---but when TVision finally set up a UK panel, they found exacty the same thing as we have long noted inthe U.S.---about a third of the presumed "audience" wasn't really there. I believe that the same result would obtain in Australia.