How Does Google Analytics Collect Data?

First in a series of articles explaining how Google Analytics works and how to get the most out of it.

How Does Google Analytics Collect Data?

First in a series of articles explaining how Google Analytics works and how to get the most out of it.

Google Analytics Logo
Google Analytics

Many of my clients are already using Google Analytics to gather information on how users interact with their website. However, many of them do not realise how much power Google Analytics actually has and quite how useful it can be into gaining insights into their visitor’s habits. I thought I would share some of the knowledge I have gleaned over the last few years of carrying out Search Engine Optimisation and the role that Google Analytics can play in helping you to achieve your online goals.

In the first part of this series I will briefly run over the way that Google actually collects data from your site that it then uses to produce the reports that you check on a daily or weekly basis.

The piece of information that you need to know about is that the piece of code that you pasted into your pages is called the Google Analytics Tracking Code (GATC). The GATC is a small piece of Javascript and you need to add it to every page that you want to collect data on. If you are using a content management system (CMS) then you will just need to add it to your page template(s).

When a user visits your site their browser sends a request to your sites server and the server sends the required page. The page includes the GATC and this is triggered when the page is loaded into the users browser. Here is the next bit of info for you to amaze your friends with on Friday night. The javascript that you pasted into your page is not all the code that is needed, it pulls more code in from elsewhere. “Where exactly?” you might ask. Well, here is the clever bit. If you live in Blackburn it will pull the code in from the nearest Google server to Blackburn. If you are in Leeds, it picks the nearest Google server to Leeds. This is done to help minimise any performance issues that might arise while waiting for the code to download. OK, you might not want to tell your friends about this on Friday, but it is a rather interesting fact.

Next the code collects a whole host of information about your visitor from their browser such as the operating system, browser etc.

GATC then sets or resets various cookies on the users machine that store data and that can be accessed again if the visitor ever returns to your site. If cookies cannot be set then Google Analytics cannot track your visitor. All the cookies used by Google Analytics are first-party cookies. This means that they are set by your website. There are also third party cookies, these are cookies that belong to one site but are set from another. Another point to note is that different browsers use different cookies. So if a visitor uses Internet Explorer and at some point revisits your site using, say Safari, even though it is the same machine that has been used two lots of cookies will be set.

There are five cookies that can be set by Google Analytics:

  • __utma: This is a visitor identification cookie. It contains a unique numerical identifier and GA counts these up to see how many unique visitors a site has received
  • __utmb: This is a session identifier and is used in conjunction with __utmc to calculate things such as time on page or time on site
  • __utmc: Used as already described for __utmb
  • __utmz: This helps to track where a visitor came from and is useful for helping track keywords and referral traffic
  • __utmv: This is used with custom variables that you can set up to store additional information about your site’s users

Once these cookies have been set then the data is returned to Google to be stored.

There are a couple of potential problems with this method. Firstly by using cookies you can receive a false number of unique visitors if visitors use different browsers on different visits. There is also the possibility that a visitor may have disabled cookies in which case they will not register in your analytics result at all.

There is also the possibility that sites in the EU will have an additional problem by the end of May 2011. Under the EU’s Privacy and Communications Directive site owners will have to get explicit consent from visitors before they can store and retrieve information from a visitors machine. The cookies used by Google Analytics are exactly the sort of thing that the EU is talking about. You do not need to panic if you are using Google Analytics, the legislation will not be enforced immediately, but it is something that you should be aware. Now that is something to share with your friends this Friday!