Open data - libre not gratis

It started with a tweet - mischievous Maurizio sending the following:

  1. The question was *really* hoping to get at :-) - what's the cost of open data?

I don't think I even mentioned open data in the CloudCom 2013 keynote, but as I pointed out to the questioner - that wouldn't stop me having an opinion.

So easy one - "open data" should be "free as in speech".

This somewhat trite response simply conveys that the open data should be free for reuse without restriction, more accurately conveyed as the word libre loaned from the French.

Such freedom to reuse should not be confused with gratis, supplied at no cost or as the computing community have it, "free as in beer". Open data, like open source, may or may not be available without charge to the user, but it should always be available for reuse.

The Power of Information Task Force in the UK (and there have been several others worldwide with the same view) held that in releasing information, "data should be provided at marginal cost". According to Google:
marginal cost
  1. 1.
    the cost added by producing one extra item of a product.

In the digital economy, marginal costs are often surprisingly small, but they are (by definition!) non-zero; sometimes so few people want the data that the total cost of delivery, compared to the cost of collecting and curating the data, is minuscule and we decide not to charge. Also, since for much of the government data, there is the legal requirement to respond to FOI requests, you might be better spending your money on opening the data and delivering it than doing data archeology to respond to each FOI. So, when the driving force of open data was transparency, and since there are only so many Grauniad reporters and armchair accountants (nor do they repetitively access the information to write a story), not charging is probably an appropriate model.

8 armchair accountants of 1GB = $1 per month,
100000 real users of 1GB means $12,000 per month
More recently though, open data has been touted as a driver of economic growth - at this point it is worth thinking a bit about what marginal costs then mean. In the digital economy, you can find yourself with a lot of users very quickly - one app sold onto smart phones could land you with 10s of thousands of users within months, suddenly the delivery bill becomes a consideration.

Certainly at this point a government department might decide to not try and build monstrously scaleable computing systems to deliver all this data while trying to figure out where the money can come from (in a world of shrinking govt budgets...). One solution is to require reusers of the data to take one, constantly updated, copy of the data and run their own servers to scale up with the number of their customers; hence open data sources need only to scale to the number of commercial reusers of their data rather then total number of smart phone app users.

And it would seem only sensible that commercial reusers could be asked to cough for the marginal cost, since it would be a small cost to them and a sustainability plan for open data.

Anyway, Lilian threatened me with a FOI; so here, I've come clean. Have a nice weekend.

Written on December 6, 2013