Tuesday, June 11, 2013

Micro Data is more important than Big Data

The term "Big Data" seems to be all the rage these days. Everyone uses it in someway to describe what their software company does. I often goad friends that work at startups claiming to be Big Data companies because they are handling the twitter stream or they are running X thousand transactions per second. I tell them to give me a call when their company gets up to scale a bit more.

In the online advertising world, especially with programmatic buying of ads, we have to handle nearly a million requests per second (and that is just the start of our scaling), make decisions in 5ms, track everything that happens across the globe and be able to report it to our clients within 5 minutes of it happening. With that kind of scale our budgeting systems have to be accurate within seconds or you could over spend by thousands of dollars. When you start to find that Amazon and Rackspace cloud environments can't handle your systems due to network speeds you may have achieved a decent level of scale (not kidding, we actually took down an entire Rackspace data center at one point).

So with the large number of transactions that we handle and the massive amount of data that we store and process every second, what is it that's important to us and what do we care about with our "Big Data"? There's not really much you can do with the mass of data as a whole, except maybe donate it to some research university to use in their studies, or keep it all somewhere and pay massive storage costs every month. No, it's not the Big Data that matters, what really matters is the "Micro Data" within that large data set that is interesting and with out the large amount of data it's not really possible to find micro data trends.

In online advertising the more micro the trend is the more valuable it is. If we know that every left handed race car driver in eastern Iowa is guaranteed to buy your product, then you are going to be willing to pay a lot of money to show that one person an ad (we of course care about a bit large audience than that one guy). If we can find millions of micro trends that are valuable to advertisers then we can really help guide their advertising budgets and make really good decisions on how much to spend on showing an ad to any given request to buy an ad among those million requests we see each second.

I don't claim to have coined the term micro data. I first heard about it from a friend over beers one evening. He is a researcher at the University of Colorado and is in a research lab with Big Data in the name. One of the bodies of data that they use in their research is the US Census data which collects massive amounts of data points on every household in the United States. He told me that there is nothing interesting about saying that the average annual salary of each household in the US is $X, or that the average family in the country has 1.8 children and 2.3 dogs. Those statistics are meaningless to all but politicians who want to use meaningless data for whatever purpose then need. Instead, these researchers look at the micro trends in the data to understand things better. It's much more meaningful to know the average or median income of a specific block in downtown Boulder or the average number of people living in each household in a single block in South Boulder.

That type of micro data helps in the advertising world as well since that is how advertisers want to be able to control their spending on ad buys. We spend a huge amount of our time looking for and processing "Big Data" to discover the "Micro Data" within that is so much more interesting and valuable.

1 comment:

  1. The value of finding customers cannot be overrated. Marketers need to find people that are likely to purchase their products. Publishers want to monetize their media assets. Users are willing to accept advertisements placed on the pixels that they control.

    "Big data" can be quite useful for gathering enough statistical probability that a particular person is a member of the desired target audience. People seem more and more willing to share information useful to marketers and the companies that assist them to seek out the micro trends hiding inside all of the status updates, tweets and search engine queries.

    In the traditional publishing models the target audience is typically determined by the type of media produced by the publisher. If the magazine is about race cars then chances are good that race car drivers will find the media useful and spend time looking through the material.

    In the social media publishing model the content is produced by the users. They willing share their copyright with the social media networks who use their big data to sell advertisements. The trouble for me is that unlike the traditional publishing model the company does not generate any content. The provide a platform for people to self describe themselves. The social media networks should recognize the users contributions and cut them into the financial incentives provided by the deep pockets of the marketers.

    I've been trying to imagine a platform for social networking that enables the population at large to keep better track of their own micro data and to enable marketers or internet services to ask their digital identity what they would to see. Providing financial rewards to those who want to publish their own life in the digital world.

    Perhaps I should just take the advise that you plastered on my wall when I sat in the office next to yours...Shut up and code!

    PeterM

    ReplyDelete