Tim O'Reilly - data

Set My Data Free: a guest post by Tim O'Reilly

Posted by Matthew Stibbe
Picture of Matthew Stibbe
on 6 December 2006
Advice Miscellaneous

Tim O'Reilly, coiner of the phrase "Web 2.0" and CEO of O'Reilly Media, agreed to do an email interview with me for an article I wrote recently. However, my questions to him got lost in the cloud and, like the star he is, he wrote a short essay based on my original description of the article - "Set your data free". I'm very grateful to him for his permission to run it here in full.

Tim's response

Set your data free? Well, maybe. I certainly agree that groups like movemydata.org or wesabe.com's [NOW DEFUNCT] "User's data bill of rights," are the first of many such declarations that user data belongs to the user, and that companies should or will make it easy for the user to move, delete or edit that data. Just as free and open source software became an unstoppable movement, so too will the free and open data movement one day be front page news in the computer industry.

But the reason that it will be front page news is that data is becoming so valuable that many companies are wanting to put it under lock and key. Yes, some companies are promising open data, web search engines depend on the right to freely copy and make indexes as derivative works, and Web 2.0 phenomena such as mashups are dependent on data sharing between sites, but there is also a gravitational pull in the opposite direction.

The problem is this: open internet standards and open source software lead inevitably to the commodification of software. It's hard to charge for a web server or a web browser when there's a great free one implementing the same standard protocol. It's even getting harder to charge for an operating system, or programming tools, or even general business applications, when user expectations of functionality are standardized and there are free and open source offerings. But as Harvard Business School professor Clayton Christensen notes, there's a "Law of Conservation of Attractive Profits" that ensures that when some product becomes commoditized, value migrates to an adjacent level in the stack or product pipeline. In short, people are looking for new business models.

And as both hardware and software are being commoditized, value and business leverage are moving to data, and more specifically to large databases that harness network effects to get better the more people use them. EBay's continued dominance of the online auction market isn't the result of superior software so much as its the result of first mover advantage in achieving a critical mass of user-generated buyer and seller data. Ditto CraigsList. Amazon and Barnes & Noble offer the same books, but Amazon's catalog is better because it has an order-of-magnitude more user-contributed data. And do you really think that another online encyclopedia could now displace Wikipedia?

Even Google, much of whose data is extrinsic to the company, on the web itself, and available to competitors, market dominance is driven not just by superior algorithms, but by its richer database of relevant ads, its better data mining of user search behavior, its richer insight into the data structure of the web. All of these are data competencies, and one secret to Google's speed is the extent to which it has built proprietary databases that accelerate the application of all the knowledge they've accumulated.

And when data is valuable, and the key to competitive advantage, companies don't want to share. And so, when Zooomr wanted to use Flickr's own API to populate its service with photos, Flickr said no. They don't mind the user taking down their own photos, but they don't want a competitor to do it wholesale. Similarly, Craigslist recently asked companies like SimplyHired and Indeed to stop spidering their job listings for their job aggregation sites.

Already, forward-thinking companies realize that a great deal of their business leverage is in their data. NavTeq, one of the two companies that dominates the street map data that drives everything from in-car navigation to web-based maps and directions, cheekily imitates the Intel Inside logo with a "NavTeq Onboard" badge on cars. And as early as 1991, General Motors realized that the value in the Onstar system would be in the data they collect -- and sure enough, as GPS technology becomes more widespread, businesses are using that data in new ways. For example, Norwich Union. a UK insurance company, has begun offering "Pay as You Drive" insurance, in which your rates are based not on where you live but on where, how fast, and how far you drive.

In short, I believe that we're in for a round of data wars analogous to the software API wars of the 1990s. Peter Bloom, a managing partner at General Atlantic Partners, recently brought to my attention the fact that there's currently a $40 million dollar lawsuit over who owns a single data item: the 3 pm closing spot price of Texas crude oil.

Another front in the data wars is around online file sharing. While the music and movie industries are gradually seeking accomodation with purveyors of file sharing technology, and I believe that there will ultimately be a rich marketplace of paid content, there's no question but that the ease of digital copying challenges the business models of content industries based on scarcity. Instead, there is a new calculus in which the benefits of viral distribution compete in the bookkeeper's mind with the value of restricted access. In 2002, I wrote an essay entitled "Piracy is Progressive Taxation," in which I argued that for most creative works, obscurity is a greater danger than piracy, and if piracy brings greater visibility to what Chris Anderson later called the Long Tail [See the Bad Language post on The Long Tail], it will be worth some dimunition in the revenues accruing to the top artists. And as Chris has so compellingly argued, the benefits of unlimited access to obscure long tail content has driven the success of new media giants, from Netflix and Amazon to Google.

But music, movies and books are the least of it. Personal fabrication machines are at the point on the cost curve where typesetters were just before the desktop publishing revolution. And already, on sites like Second Life, or Google Earth with sketchup, or instructables.com, people are sharing designs for physical goods. What battles can we expect over open vs. closed data when the files that people are downloading from Limewire or BitTorrent are not songs but stuff?

All in all, it's going to get a lot more interesting. Open data is essential, but as data becomes the locus for value capture, so too will closed data be essential. It's never either-or. Instead, as with software and with virtually every other part of the economy, there are tradeoffs between what you share and what you keep for your own advantage, and the secret of success is to make the right choices.New call-to-action

See also: Articulate's tools

Related service: Company culture