Track how users read web content with Google Analytics

Funnel visualisation of reading info events in Google Analytics

My website exists mainly to share knowlegde. Even though this specific article is only the second one I published here, I can state the blog section is the main part of this site.

Now there are many more websites that exist mainly to provide (readable) content. And probably even more for that have other primary goals, but still have content that they want their users to read. So we may deduce that there are people out there interested in whether their web content is being read.

Still, there is not really a standard way to track how content is being read on the internet.

You sometimes see a "Read more" button that you need to click to reveal the content. This click can be measured as intent to read more. But it doesn't exactly enhance the user experience that someone needs to notice and click this button before they are shown all content.

Sites that perform tracking with Google Analytics and/or Google Tag Manager often implement scroll tracking. It's very easy to turn on, but standard scroll tracking measures what percentage of the entire page has been in the viewport. While we're usually interested in only a certain part of the page. So a percentage of the full page is not really of interest. On top of that, scroll tracking by itself it doesn't take into account time: is someone just skimming through my article quickly, or actually reading it?

In this article I describe how I track reading of web content. I've named the method reading info.

Inspiration for reading info

First I want to credit the two articles that were my main sources of inspiration when working on this subject. My contribution is tiny compared to the work already done by the writers of these articles. I simply made a few tweaks to their ideas (and code).

  • Track content with Enhanced Ecommerce - Simo Ahava

    This is one of my favourite articles on Simo's blog. I especially admire the spark of creativity that led to applying the Enhanced Ecommerce plugin in a new context. I implemented the part of his solution that deals with an actual article page on a few websites already (from product view to purchase of an article). If a website has no actual ecommerce, this is a very convenient way to make use of all the ecommerce reporting options available in Universal Analytics. I expect to use the other parts of Simo's solution as a basis to expand the reading info data model at some point in the future.

  • Use the Intersection Observer API for analytics events - Stephanie Eckles

    I implement reading-info in two ways. The preferred method is entirely based on Stephanie's use of the Intersection Observer API. Any time I can get the website developers to implement html tags with milestone points, like the midpoint and endpoint of an article, I use this method. Only when that is not feasible I fall back to doing calculations on scroll events.

If you reach the endpoint of this article, you'll have scrolled past the code for both of the methods I use. Before getting to the analytics implementation, I explain the data model for reading info and provide some basic reports that can be created if you use one of the scripts.

The data model

Taking Simo's solution as a starting point, we can view the reading proces as a funnel with certain milestones. But for sites that use ecommerce events for something else (for example ACTUAL ecommerce events) it makes sense to not use the reserved ecommerce events for reading. Of course, in any case it's more logical to name events that track content after what they actually are. For example using 'scroll_article' instead of 'add_to_cart' and 'reach_checkpoint_article' instead of 'checkout'.

Good thing that Google Analytics introduced GA4, which allows the flexibilty of naming our own events. It also lets us create custom funnels based on those events. The former wasn't available at all, the latter wasn't available to non-360 users of Universal Analytics. But Enhanced Ecommerce did provide default funnel reports, which was another benefit of applying the ecommerce plugin to content tracking. Those benefits are not needed anymore, so it is time to give content tracking its own data model.

The milestones that are tracked by reading info are:

  1. Loading an article page
  2. Starting to scroll through the content
  3. Reaching the midpoint of the content
  4. Reaching the end of the content
  5. Reaching the end of the content AND spending a certain amount of time on the page, an approximation of the fact that someone actually read the entire content

The general page_view event will capture the first one; the other milestones all have their own event in reading info.

The last one requires some explanation: a website user could scroll through an article very quickly without reading it, which is not the same as reading the entire piece. So to reach the final milestone called 'read_article', a user has to both reach the end of the content and spend enough time on the page. Enough time means long enough that it's possible the user read everything. This is of course still not perfect, but better than only looking at scrolling. The time required is calculated based on the number of words in the article and a chosen reading speed.

For all events in reading info we can add more context by adding event parameters. For example, if I add nothing else I do create a dimension called article_page. This dimension allows easy identification of the pages that have read tracking during analysis and reporting, allowing me to easily separate those from other pages on the domain. More context about the content can be added easily by creating other dimensions, e.g. article_title, author, word_count, primary_category and time_spent (on the page until the event happens).

Knowing what we want to track, let's check out how it can look once some data is collected.

Basic exploration reports from reading info events

Applying the basics of the data model described above allows looking at content in a funnel visualisation.

Funnel exploration of reading info events

As a starting point, I like to create an overall funnel visualisation for all article pages on a website and an individual article breakdown table.

Table with breakdown of reading info events on article

Note that the sum of users for individual articles is more than the total users, this simply shows there are some users who had reading info events for both pieces of content.

Now it's time to look at the tracking needed to be able to create these reports.

The code, using Intersection Observer

Let's assume reading info is implemented with Google Tag Manager. The main script should be triggered on DOM ready (the script scrapes the DOM, so the DOM needs to be ready) on article pages only. One way to identify article pages could be to create a variable that returns true when an element with id midpoint is found in the DOM and false otherwise.


function() {
  return !!document.getElementById('midpoint');
}

          

Then trigger the script on some DOM ready events, ones where that variable equals true.

GTM trigger configuration to fire on dom ready when is_article_page equals true

The script selects the points to observe, counts the words in the article and calculates reading time based on a reading speed of 400 words per minute. It will push dataLayer events for the first scroll, the midpoint entering the user's viewpoint and the endpoint entering the viewport. When the endpoint has been reached and the calculated reading time has passed, the read_article events is also pushed to dataLayer.


(function() {
  var midpoint = document.getElementById("midpoint");
  var endpoint = document.getElementById("endpoint");
  var wordCount = document
    .getElementsByTagName("article")[0]
    .innerText.split(" ")
    .filter(Boolean)
    .length;
  var timeNeeded = Math.round((wordCount / 400) * 60);
    
  var getTimeSpent = function() { return Math.round(performance.now() / 1000) };
    
  var pushEvent = function(event) {
    window.dataLayer.push({
      event: "reading_info_" + event,
      timeSpent: getTimeSpent(),
    });
  };
    
  var articleObserver = new IntersectionObserver(function(entries) {
    var point = entries[0];
    if (!point.isIntersecting) return;
    articleObserver.unobserve(point.target);
    var eventName = point.target.id;
    var timeSpent = getTimeSpent();
    pushEvent("reach_" + eventName + "_article");
    
    if (eventName === "endpoint") {
      if (timeSpent > timeNeeded) {
        pushEvent("read_article");
      } else {
        var timeLeft = (timeNeeded - timeSpent) * 1000;
        setTimeout(pushEvent, timeLeft, "read_article");
      }
    }
  });
    
  window.addEventListener(
    "scroll",
    function() {
      pushEvent("scroll_article");
    },
    { once: true }
  );
    
  articleObserver.observe(midpoint);
  articleObserver.observe(endpoint);
})();

          

To pass the dataLayer events along to Google Analytics, create an event tag that triggers on any custom event that starts with reading_info. Set the event name to a variable that takes GTM's built-in event variable and returns its value with the beginning (reading_info_) stripped:


function() {
  var eventName = {{Event}};
  return eventName.slice(13);
}

          

The code, using scroll events only

When we do not have the midpoint and endpoint divs in the DOM, we need to approach read tracking slightly different. Now the scroll event provided by the browser is our main source of information. And when the user scrolls, we need to calculate what percentage of the article is in the viewport and decide whether we should push an event to dataLayer.

Note that the scroll event fires very often, so we don't want to do those calculations on every scroll event. We want to be mindful of website performance. Therefore we use a timeout in our scroll listener. Every new scroll event will clear the timeout and set a new one. Only when the timeout expires the calcualtions are triggered.

This is the script to use in this situation:


(function () {
  var timer;
  var callBackTime = 100;

  var wordCount = document
    .getElementsByTagName("article")[0]
    .innerText.split(" ")
    .filter(Boolean)
    .length;
  var timeNeeded = Math.round((wordCount / 400) * 60);

  var getTimeSpent = function() { return Math.round(performance.now() / 1000) };

  var scroller = false;
  var half = false;
  var endContent = false;
  var hasRead = false;
  
  var contentArea = "article";
  
  function trackLocation() {
    var articleRect = document
      .querySelector(contentArea)
      .getBoundingClientRect();
    var bottom = articleRect.bottom - window.innerHeight;
    var halfOfArticle = articleRect.height / 2;

    var pushEvent = function(event) {
      window.dataLayer.push({
        event: "reading_info_" + event,
        timeSpent: getTimeSpent()
      });
    };
  
    if (!scroller) {
      pushEvent("scroll_article");
      scroller = true;
    }
  
    if (bottom < halfOfArticle && !half) {
      pushEvent("reach_midpoint_article");
      half = true;
    }
  
    if (bottom <= 0 && !endContent) {
      pushEvent("reach_endpoint_article");
      endContent = true;
    }
  
    if (endContent && !hasRead) {
      if (getTimeSpent() >= timeNeeded) {
        pushEvent("read_article");
        hasRead = true;
      }
    }
  }
  
  document.addEventListener("scroll", function () {
    if (timer) {
      clearTimeout(timer);
    }
  
    timer = setTimeout(trackLocation, callBackTime);
  });
})();  

          

The variables and triggers mentioned in the previous section can all stay the same to complete the implementation.

Wrap up

After implementing reading info with either of these methods you can get some basic insight into how users engage with your articles. The scripts can be easily adjusted to measure in more detail. You can add more milestones and track chunks of 25%. You can add more event parameters to add more context about your articles. Add a metric words_read to the read_article event to be able to segment your users on the basis of how much they read. Add a primary_category dimension to see if certain topics are read more than others. Or even add other events such as article_impression and article_click to measure click-through ratio's of content.

There are so many possibilities, this example of reading info might only be the beginning of your content tracking journey.