VGTech is a blog where the developers and devops of Norways most visited website share code and tricks of the trade… Read more



Are you brilliant? We're hiring. Read more

Visualizing the most read articles on VG

JS

Article visualizationD3. Behind this name is a pretty neat concept, called Data-Driven Documents. I took a look at the framework last year after seeing a lot of cool demos using it. It’s really flexible, and is not tied to a specific form of presentation – you can use D3 to generate an HTML table from an array of numbers, or use the same data to create an interactive SVG bar chart with smooth transitions and interaction.

After looking through different layout algorithms available in D3, I found the treemap algorithm particularly interesting. I’ve seen it used before in both profiling tools and disk usage analyzers and found them to be very efficient for visualizing the difference between numbers. An idea popped into my head: “Maybe this can be used to visualize which articles are being read the most?”. I decided to give it a try.

Fetching the data

The easy way to do this would be to export some data from our analytics systems every once in a while. It felt a little too static, in my mind – I wanted something more dynamic, preferably with real-time data.

We’re using Varnish Cache at VG, so we can’t just parse the webserver logs. However, using a tool called vstatd/VCS (Varnish Custom Statistics), we are able to log hits into different keys, which we can filter on and later sort these keys by the number of hits. This gives us realtime statistics, and also allows us to go a couple of minutes back in time, depending on bucket size and the number of buckets we have set up in our configuration.

Every time someone reads one of our articles, they will hit Varnish, which will assign a hash to the request containing the article ID – lets say ARTICLE-<ID>. We can then filter on keys starting with ARTICLE- and sort the results descending by the number of hits. For every article in the top-list, we fetch some basic article information, such as the title of the article, category and the “lead asset” (main article image, usually). I’ve written a simple node.js application that does these steps and polls for new information every 5 seconds.

Presenting the data

After all the data has been retrieved and is available in a simple JSON array, presenting it using D3 is fairly simple. We group the results by category, then use the enter/exit pattern of D3 to easily add, remove and update nodes. The treemap algorithm automatically calculates x, y, width and height for our nodes, based on the defined size of our treemap:

Show code
// Initialize the treemap
var treemap = d3.layout.treemap()
    .size([width, height])
    .sticky(false)
    .value(function(d) { return d.size; });

var leaves = treemap(jsonData);

// Node-positioning function
var position = function() {
    this.style('left',   function(d) { return d.x + 'px'; })
        .style('top',    function(d) { return d.y + 'px'; })
        .style('width',  function(d) { return d.dx + 'px'; })
        .style('height', function(d) { return d.dy + 'px'; });
};

// Set background-image based on data
var getBackgroundStyle = function(d) {
    return 'url(' + d.img + ')';
};

// Select all nodes, join data on id
var nodes = domRoot
    .selectAll('.node')
    .data(leaves, function(d) { return d.id; });

// On new nodes...
nodes.enter().append('a')
    .attr('href', function(d) { return d.url; })
    .style('background', getBackgroundStyle)
    .text(function(d) { return d.name; })
    .call(position);

// Remove old nodes
nodes.exit().remove();

// Update existing nodes
nodes
    .style('background', getBackgroundStyle)
    .transition()
    .duration(750)
    .text(function(d) { return d.name; })
    .call(position);

Conclusion

Having proven to myself that the visualization worked as I had hoped it would, I wanted to wrap it into a dashboard-like prototype that we could put up on a monitor in the office. Basically, there is now a node.js application doing four different tasks:

  • Fetches new lists of the most read articles every few seconds
  • Fetches article information for the top articles
  • Provides a simple data endpoint to retrieve the data we need
  • Serves a static webpage which will serve as our dashboard

The solution is now available for anyone who wants to give it a try.

Taking it further

I wanted to make it a little more interactive, so I added some options for toggling images and article titles on and off, setting the number of articles to show, frequency of updates and the data timeframe to fetch.

What I found was that with a small timeframe (say, 10 seconds), the data was very dynamic. However, it might not have enough data to really represent the full picture. With a timeframe of 30 seconds, we can get a clearer picture of what is going on. If you set it to update every 10 seconds, you still get a moving window which is fairly dynamic yet still more statistically correct.

Take a look at the current prototype at mestlest.vg.no. It was a fun project to make, and I will definitely be using D3 more in the future! Hope you like the prototype :-)

Article visualization

Developer at VG with a passion for Node.js, React, PHP and the web platform as a whole. espen.codes - @rexxars


3 comments

  • Per Buer

    Pretty cool. How much time did you spend on this?


    • Espen Hovlandsdal

      Hard to say - getting the first prototype up and running took about 8 hours, and that includes getting to know D3. Tweaking and improving it took a while longer, mostly because the prototype was left as a prototype for a good while until someone said we should make it open to the public.

      I rewrote the backend to handle traffic better, and spent a few days trying (and failing) to scale the font size correctly.

  • Jostein

    Indeed pretty cool. Could maybe have it running on a spare monitor or something. Would be cool as a screen saver too, but I'm not sure how easy it is to make screensavers from HTML.

    If it's meant to be exposed to the public (and not just a tech demo) I'd recommend adding some aria-live and tabindex attributes. It's possible, but a bit messy, to navigate the page using a keyboard (ChromeVox is great for testing).


Leave your comment