5 Ways to Root Out Your “Bad” Marketing Data

Last week, we talked about the real-world impact of bad marketing data and how it affects your bottom line.

Now that you are concerned about these implications, where should you look for these spots of bad data? Here are several prevalent occurrences:

1. Bad campaign taxonomy

One of the prevailing reasons marketing data is corrupted is due to unprincipled campaign taxonomy. Most digital campaigns are tracked using a unique string of characters that identifies the campaign in reporting. You apply taxonomy tags to identify campaigns, referring channels, and other information about your marketing activities. In this way, you can categorize traffic based on the content of the campaign tracking code. But different brand groups, different geographies, and different agencies, create their own way of naming campaigns. With so many fingers in the pie, aligning campaign taxonomies can become an exercise in contortion.

We’ve seen several ways brands attempt to fix or adjust their campaign taxonomy when users believe it has been corrupted. Some use data governance to attempt to enforce taxonomy rules, but the reality is that teams change, people come and go, and eventually these “rules” are broken, and you’re left with questions about the data. Often, many teams are working on cleaning up the taxonomy in Excel. This leads to redundancy and disparate models, further muddling things.

As an example, maybe not all of your campaigns are tagging and tracking strategy properly. You may think you have brand campaign taxonomy under control, because you can say, “I’m spending $X on advertising, and we are getting these overall results.” But what you can’t confirm is strategy attribution. You’re hard pressed to be certain that strategy A led to an overall increase or decrease in your brand dollars, because strategy A was only tracked by 3 brands out of 100 or 3 markets out of 100. So, in the end with inconsistent taxonomy, the company doesn’t have a global picture of how Strategy A  is performing across brands, geographies and channels.


2. Short-term marketing campaign tests

Often marketers run 1-day tests on their campaigns to evaluate a possibility. The data collected isn’t “bad,” because you learned something, but that 1-day test skews the larger campaign trend. So, it’s important to remove this data when evaluating trend performance.

3. Imprecise data

Spend is a fundamental measure relevant to many of your key performance metrics, such as cost per conversion. So, knowing your spend is key, but some ad servers over report spend. For instance, given a 1 million impression buy, those million impressions are often guaranteed. However, the ad server doesn’t stop cold after delivering 1 million impressions. If they deliver 1.2 million, those last 200k are value-add. Free! But the ad server media cost is calculated as though you paid for all of them. The resulting metric can be better described as “value delivered,” not “spend.” In fact, that over-delivery likely helped drive conversions. Remember, it is not part of your cost per conversion. There are best practices for correcting spend and maintaining a window into value delivered.

4. Mismatched marketing data

Then there is the issue of mismatched data. Again, like the above-mentioned, one wouldn’t consider it “bad data,” but it can alter results. For instance, when you change data vendors. This can also lead to holes in your data. For example, you’ve been working with one search partner for a couple of years, and you change to a new vendor, because they offer something attractive, such as new, relevant features or have better pricing or they are simply more convenient. These vendors aren’t identical in their reporting, so you will have varied spots and gaps of information. This incongruous information isn’t “bad” or poor quality. It just doesn’t let you compare apples to apples and leaves voids in your conclusions.

5. One-time marketing data pulls

Sometimes, marketers find themselves pulling together marketing data from disparate systems in a special “one-off” analysis request. Inevitably, there are different levels of detail from each system and the information isn’t aligned. So, you are spending time trying to make sense of the spotty data, not to mention all the time it took collecting the data and bringing it together. As a result, any insights to be gained are delayed. Best practice and what leads to sound conclusions is recurring data loads that are set up with the same structure. All the information is at your fingertips any time you need it, and you are comparing apples to apples. If you have data coming in from a marketing system, assume that you will need it at some point and go ahead and bring into the fold of your data analysis.

What happens in all of these situations is that as you pull away from having exact information, your data loses its explanatory and predictive power. Marketers are cherry-picking the data they like and the information they can understand and have the most of. Every professional is motivated to show success, so they lean into that information, but we learn as much (sometimes more) from what didn’t work as what did work. And these human tendencies perpetuate the issues that lead to the “costs of bad data.”

When we look at how to address the evaluation of “bad data” or spotty data or skewed data, there are 3 actions you can take to eliminate data that will lead you to erroneous conclusions and give you more confidence:

  1. Business Rules – Create logic in your data analysis to normalize the information. The classic example is when people put in Yahoo / Yahoo! Or Wal-mart / WalMart. A human recognizes these are the same, but you need to tell a machine to treat these as the same, creating alike groups for analysis. These raw to clean data rules are the most simplistic. Another example of when business rules work well is when a set of rules leverage purchase or invoice data to properly cap costs from the ad server, as noted before.
  2. Suppression – Another way to exclude erroneous data is by suppressing particular information so that it doesn’t bias the data.
  3. Data Governance – This is a preventative measure, best used when you anticipate data discrepancies. For example, when you know in advance that you have data from a one-time-event that would bias the large trend trajectory, and so you want to put that information in a silo. Data governance a huge topic that should not be minimized. For the purposes of this short blog, we are acknowledging that it’s one of three ways to curb information that will unfairly tilt the conclusions.

All of the above-mentioned scenarios assume you have a complete view of your data, and you know where to look for anomalies and erroneous information that will steer your strategy in the wrong direction. What happens when you are relying on quantitative information, but you just don’t trust your data? You know something is amiss, but can’t find the information to isolate it? Next week, we’ll dig into when decisions are made on untrusted data.

For more on "bad data" and how it affects your marketing check out these other blog posts:

Photo by gdtography on Unsplash

Recent Posts

Posts by Topic

see all

Subscribe to the Velocidi Blog

Join our community today and receive weekly updates by email.