Open Data is not working – how to fix it?

open_data_read_write_society

Last April we organised together with the CTTI –  Generalitat of Catalonia (our regional gov) a workshop on Open Data. We have been working intensively on the subject for quite some years resulting in some papers, projects and a special article in the Communications of the ACM that will appear soon. We wanted to share our work with the Open Data community in Catalonia.

Since the early days when Marta Continente stablished the first Open Data portals in Catalonia we have witnessed an explosion an explosion of initiatives around Open Data. Lot’s of cities have their own Open Data portal with the ambition of ensuring transparency and stimulating the provision of services by third parties. Our reality though is not so different than the one in many other places, the scale and maybe the level of commitment is different, however results are mostly in the same line.

As in many other places, outcomes are a poor match for the vision, at most. Maybe it is time to acknowledge that Open Data is not working the way we expected and needs to be fixed.

 The best case for Open Data in the world is data.gov in the US (powered by the OKF Ckan platform). With around 400K datasets from more than 200 organisations such as federal agencies, sub-agencies or PPP and more than 50 Open Data challenges with prizes ranging from $30 to $15M is the largest more comprehensive Open Data portal in the world.   

However, there is another side of the coin. Even if downloads accounted for more than 4M, less than 150 apps are a direct result of the portal and out of those, only 24% have more than 10K downloads. Not a single app is in the top 100. In addition to that, downloads are decreasing.

There is a huge contrast between ambition and reality, both in terms of the use of Open Data for transparency and as a way to spur civic applications.

The failure to materialize the bold vision of Open Data led to different kinds of fixes and remedies that we categorised into two generations.

The first generation of fixes revolved around the obsession of having a larger number of datasets available. The idea was that more datasets will lead not only to more apps but also to a greater level of transparency. Obstacles in opening data, mostly came from the administration. The difficulties of breaking up silos and re-structuring IT processes were the protagonists. Remedies were addressed to solve these obstacles ranging from top-down efforts such as laws and regulations to enforce data opening or bottom-up efforts around re-structuring and empowering policymakers and supporting the demands of citizens.

These fixes had clear results in terms of number and quality of datasets opened, but fell short in materialising the vision of Open Data particularly in terms of usage. 

A second generation of remedies focused on acting against the obstacles that could difficult the development of apps or their diffusion characterised this second generation. Data standardisation was revealed to be a major obstacle, together with the lack of continuous data through APIs instead of downloadable flies. Many efforts were devoted at this. Also the translation of civic and administration/gov needs to developers was commonly addressed with workshops combining policymakers, citizens, business, academics and developers.

All these efforts not only diffused the importance of Open Data through administrations where it found a clear place in the agenda but also contribute to close the gap between developers and policymakers with a better understanding of both worlds that had implications well beyond Open Data.

However, metrics were still mostly the same: How many datasets are open and how many hackatons or in general events, have been organized.

And yes, one of the problems are these metrics. Open Data is not an end but a means to greater transparency and promoting the development of apps by an ecosystem of for-profit and non-for-profit organizations.

If we don’t measure the objective we will never get there. And the objective here is how much is this Open Data used for transparency and apps. Do our journalists use it? Do we have independent organisations that use it to check the fairness of gov? How many apps – particularly popular ones – use Open Data? Does it result in more services for citizens?

Therefore focusing on projects around opening datasets instead of fostering their usage by journalists, civic organizations, researchers and developers is not conductive to our objective but only to having a large number of datasets that maybe few people use.

In our research we detected four main problems beyond the metrics: Discovery, Standardisation, Trust & tools and Business Models.

Discovery

The discovery problem has two sides: on one side developers and on the other users.

Discovery is a real problem for developers. Only in Spain we have 8,000 local authorities … For a small team of developers is just impossible to find out where your Open Data portal is or if it exists at all.

Citizens and tourists have to find your app too. If you didn’t notice Cities don’t have a tab in Apple’s App Store, which makes it even more difficult.

The discovery problem is not impossible to solve, particularly the part regarding developers but it is crucial, nobody is going to use what he/she cannot find.

Standarization

Imagine that you are a developer and you want to build an app for finding the closest and cheapest parking (now pretty much in fashion). Basic questions are: is the data needed there? and is it in the same format? also with the same meaning?

This last part is really important if you are a journalist or a civic organization comparing budgets. Do they mean the same?

If data is not standardised then you will use only the ones of your city or big cities related, resulting on none or very limited usage, to say the best, for small or medium cities. Therefore, data without standardisation is useless for a large group of cities.

Trust & Tools

A major objective of Open Data is transparency, but transparency is not about local authorities providing better presented information (they cannot be part and arbiter). Transparency is normally not immediately directed to citizens either, we just don’t have the time to investigate. Transparency is directed to journalists and civic organisations that will use this data to check the workings and performance of city halls.

How easy is for journalists and civic organisations to trust and access this data? Can they easily use existing tools to navigate and check it? Do they have direct communication lines so they can request the data that they need and confirm its alleged accuracy?

Business Model

A major problem for developers is that there is no business model that could work well for civic apps.

Let’s take a medium city like Amsterdam (600,000 habitants) or Barcelona (1,200,000) or their metro areas (around 3M each). Let’s choose a popular app that 5% of the population downloads @0,99€. Just make the numbers, it doesn’t work.

You may say ads, well, same thing, there is not enough public that could compensate.

Certainly, the lack of a functioning business model is probably the major problem for city apps. Solving standardisation and discovery alleviates this problem a bit, but it is still there.

Again, everything starts with asking the right questions because from it metrics com. The right question in Open Data is not how many datasets are open (I am not implying that we have to limit the right or availability of information) or how many events did we produce but how much is Open Data used for transparency and apps.

Changing the question is the first step to fixing the problem and in this case realizing the transformational vision of empowering citizens with data, with their own data, for transparency, civic services and growth. 

This post is a summary of some of the Open Data research @ ESADE done by Melissa Jo Lee, Jonathan Wareham and Esteve Almirall, shall you be interested in knowing more please follow this link