Stats and What We're Doing About Them

Some of you might have noticed timeouts on the statistics pages. And by some of you, I mean most of you. And since I haven't met a blogger yet that didn't love his stats or crave more of them, I can only assume that these timeouts aren't warmly received as the opportunity to reflect on the qualitative statistics that your mind can generate in lieu of harder measurements that we provide.

Before I get to the meat of this entry, let me tell you a little about how we got to where we are. We log every request your visitors make to a database. The statistics pages use that database plus information about your entries, comments, and such to generate the reports you see in real-time.

When we were developing the initial release, we knew that there was one looming flaw in our plan. The flaw is that we didn't have a way to aggregate the data such that we could purge the request database periodically. So that database has gotten rather large and cumbersome, to the tune of tens of millions of records.

As you can imagine, providing real-time statistics of the sort we compile on a dataset that large is an awesome task. And not the good kind of awesome. Something has to be done and we're conflicted about the best course of action to take.

The first order of business was to separate out the reports. Although it looked really slick, putting four of these reports on the same page was just asking for overload. And, by and large, the separation has worked. We're only seeing timeouts on a few reports and those ones are monsters.

The next thing on the agenda is to clear out the database of existing records. I mention this with a very sad heart because the tens of millions of rows there contains half a million of my own blog's requests. But we must clear it out if we want to fix it: at the size it is right now, any pruning and correction is untenable. By starting fresh, we can finally eliminate the dreaded "Unknown" browser and operating system statistics. That makes it worthwhile to me and I'd welcome your thoughts.

Once that's done, we can address the best (read: most scalable) way to aggregate some of these statistics so we can offer you more meaningful data about your readers and how they're interacting with your blog.

The conflict then is between maintaining a huge, sometimes-inaccurate store of past data and starting fresh nearly a year after our initial release. At first blush, it seems like an easy decision: wipe it. But when you start thinking about it further, you realize that that would eviscerate the most popular entries report, the article views report, and the printed articles report. Ideally, we'd figure out a way to aggregate the data for those reports and then clear it out but that could take awhile and the database just grows faster and faster every day.

I know this was a fairly long entry and I appreciate anyone who's read this far. We're going to clear out the database of everything older than 90 days because that is, after all, the sensible thing to do. We are moving that data to a different database until we finish our aggregation scheme and then we will merge that data. So you can think of your statistics as gone but not forgotten; soon you will be reunited. Thank you for your patience and understanding in this matter!

 

What did you think of this article?




Trackbacks
  • Trackbacks are closed for this post.
Comments

  • 12/11/2006 5:57 PM Alexander Wunderlich wrote:
    alright man, nuke it!
    1. 12/11/2006 6:00 PM Admin wrote:

      And then send it some candy? 

      John
      Quick Blog Team


  • 2/16/2007 9:38 PM Mukund Mohan wrote:
    How can I automate the process of putting the google analytics code on each blog entry since I dont like your reporting and analytics at all?
    Mukund
    1. 2/16/2007 9:52 PM Bill wrote:
      That's easy! Get the Google Analytics script code from the site and then paste it into a Custom HTML sidebar item on the Design > Sidebar page. Make sure you uncheck the "Show on blog" checkbox. Otherwise, you'll have an empty box in your sidebar. There is no need to put the Google Analytic code on each entry.

      Hope that helped, Bill
      1. 8/1/2007 10:27 AM Ferg wrote:
        I found this thread through a Google search. I'm trying to get Google analytics to work, and what you suggested does not work.

        any ideas here?

        Thanks
        1. 8/1/2007 5:38 PM Admin wrote:
          Greetings Ferg,

          Are you referring to using the sidebar to verify the site with Google analytics ?  This was functioning but we'll take a look at it.

          Regards
          John
          Quick Blogcast team
  • 8/1/2007 5:42 PM Ferg wrote:
    I got this working. It took a while. I had to buffer the analytics code with some other text ahead of it to work. The tool was a bit buggy though. Sometimes I had to exit and start over to get it to enter any text.
  • 8/1/2007 8:05 PM Ferg wrote:
    Ok, I take that last comment back.

    I have Google analytics working, but now I can't get the Verifier to work.

    The Sidebar custom text seems to have some sort of bug in it where text is lost or something.
  • 9/12/2007 12:58 PM Juan wrote:
    I have my Google Analytics code inserted as a custom text on the side bar.

    However, the info provided by Google is nowhere near what other statistics are indicating, including estimates by Alexa.

    I am not really sure that the code is correctly placed to capture all visits to my blog.

    Would it be more effective to place the code in each of the posts?

    Can I use site meter together with Google Analytics? Are both compatible?

    I have another issue.

    Much I have tried, I can´t place the google verication code. I tried putting it on the side nav as a custom text, but google won´t detect it.

    Can you help me with this too?

    I have a final question? My blog´s home page has a size of 41k, too large for a fluent download?

    How can I diminish this size? Any suggestion?

    Too many questions,

    Thank You
    1. 9/17/2007 9:43 AM Admin wrote:
      Juan,

      Sorry for the delay I'm still researching the best way to accommodate these requests and plan a post on the blog on this specific subject in the near future.     Sitemeter and Google should co-exist  ( I can't guarantee that just that I don't see a reason why they wouldn't work)   I'm working on getting a foolproof way to add the Google code in place since adding it in as a hidden sidebar item does not seem to work for most people as we had thought it should.

      41k is not a large page size,  even on a modem it takes  <15 seconds at 28.8  if that's a concern re the type of users that you have then there are many ways in which you could decrease  the size by  using excerpts, removing unwanted sidebar items.

      John
      QBC Team

Leave a comment

Comments are closed.