Our relationship with data is changing. Each day brings more connections, integrations, and tools to satisfy our biggest data dreams. Omni-channel, the Internet of Things and real-time bring the promise of Big Data with revolutionary insights. But before we can go big, we need to address a common issue affecting our data – it’s dirty. From bot traffic to self-referrals, our data isn’t always as clean as we would like. Below are five common culprits of dirty data and how to fix them:

UTM Code Errors

This can be one of the simplest mistakes, but it can also be one of the most frustrating when it comes time to report on your data. Spelling errors, changes in capitalization, and inconsistent tagging can all affect how your data appears in Google Analytics. For example, a link tagged with “utm_source=twitter&utm_medium=social” will appear as an entirely different source/medium than “utm_source=Twitter&utm_medium=Social”. To avoid dealing with some unnecessary math as you report on monthly performance, stay consistent when it comes to your UTM codes.


Self-referrals are any sessions in the referral report that are attributed to your own domain. The trouble with these pesky referrals is that they leave out an important piece of information – where this traffic is actually coming from. What causes the issue? Self-referrals can be caused by a few different things. A couple of common causes include missing tracking on certain pages and possible issues with redirects on your site. To address the issue, start by reviewing the landing pages of your self-referrals to identify any possible trends and look into all redirects to ensure that parameters are being passed.

Duplicates of the Same Page

Anyone coming to your website has the potential to affect your data. For example, a visitor to your site could type in your URL as “yourwebsite.com/TEST” while another user could type “yourwebsite.com/test”. Analytics will record these two URI variations (the extension on the end of your domain that differentiates pages on your site) as two separate pages in your report (“/test” and “/TEST”). What can you do about it? While there is no way to fix historical data, you can correct this issue moving forward. Depending on how you report on your data, it may be worth it to set up a lowercase request URI filter to make all casing variations appear as lowercase.

Internal & Testing Traffic

As you set up your Google Analytics, it’s important to remember to exclude both internal (traffic coming from your home or company) and testing (traffic coming from your internal or agency’s testing site) traffic. If you missed this step or something changes, you may begin to see some unwanted traffic in your analytics. How can you fix it? To exclude internal traffic, simply set up an IP exclusion filter or add to your existing list if you already have one. For testing traffic, we recommend setting up an advanced segment to remove any testing hostnames from your reports. Doing so will allow you to view all current and historical data without having to manually sort through invalid traffic.


Organic, direct, referral – spam traffic can affect metrics relating to all of these mediums and more. You may be familiar with this phenomenon by now. Sites like “semalt.com” or “free-social-buttons.xyz” have been plaguing website owners and business analysts for years. While the majority of spam was coming through under the guise of referral traffic, it has begun to appear in the form of organic keywords and, even worse, direct traffic in recent years.

So how do you fix the issue? Some types of traffic can be a bit trickier than others. Direct traffic, for example, is difficult to identify as being spam since everything falls within the same source/medium. Referral spam, on the other hand, can be more easily pinpointed. While it can be difficult to resolve the issue of spam traffic altogether, setting up a filter or segment in your Google Analytics view can help relieve your headache. As with any big changes to your analytics, it’s important to consult an expert first. Also, remember to keep a single unfiltered Google Analytics view and test any filter before it’s applied.

When the time comes to review your monthly performance, keep these five common causes of dirty data in mind and you’ll work your way towards cleaner, headache free data.

If your mom saw your analytics data, would she tell you to clean your room? Check out our Analytics and Measurement capabilities to see how we can help you out.