My attention was caught the other day by an article in The Register: “Data.gov.uk chief admits transparency concerns”
The head of the government’s website for the release of public sector data has said it is a challenge to ensure that users can understand the statistics.
Cabinet Office official Richard Stirling, who leads the team that runs Data.gov.uk, said that if he was at the Office for National Statistics he would have concerns about statistical releases and people making assumptions “that aren’t quite valid”.
The article was based on a podcast interview with Richard, and in typical journalistic style, took one part of his message and ignored the rest. To get the full picture, listen to the original audio.
Bearing all this in mind, though, I do think this is an important issue which probably needs to be explored more thoroughly than it already exists.
To use myself as an example: I’m a geek, and I like computers, the internet *and* I find government interesting. I suspect this puts me into a very small percentage of the population. But even then – other than thinking open government data is almost certainly a good thing, and being able to reel off all the arguments around transparency and improving services – I don’t really understand or know how this happens. I am completely data illiterate.
This takes two forms. Firstly, knowing what data is, what format it is in and what can be done with it. Essentially a techie thing – fine, the data is there, but how do I do anything with it? This is probably the least important problem, because producing apps and mashups probably isn’t something that everybody needs to be able to do.
The second form is more important, though, and that is based in statistical awareness, understanding of how data is manipulated, and a grip of the context within which the original data was published.
In other words, if I come across some nifty app using open government data, how do I know what biases the developer had? Who – if anyone – paid them to do this? How can I check that the results it produces are correct?
Because even though the original data is published openly, and I can check that, the chances are I will not understand the relationship between that and the nifty app in question.
There’s always the argument that it was ever thus – not that it is a particularly good argument – but when statistical analysis appears in a newspaper, for example, most people are aware of the biases of those publications.
Don’t get me wrong – releasing data is important. But the technical challenges are of course the easy bit, whether sticking a CSV file on a web page or creating an API. What I am talking about is ensuring a reasonable level of data literacy for people at the receiving end.
Hadley Beeman’s project could well be something that could answer some of these issues, by providing a space for data to be stored, converted to a common format and appropriately annotated (assuming I have understood it correctly!).
Another possibility is a book being written in Canada, or the Straight Statistics site which seems full of good information (thanks to Simon for the tip).
But none of these seems to scratch the data literacy itch, really. We need interesting, well written, engaging content to help people get to a level where they can understand the process and context of open data. Might it even involve e-learning? It could do.
Hey Dave
Well said (and some really useful links in here…).
I’ll try and blog next week about a sister-project to Hadley’s I’ve been sketching out around many of the issues you raise (transparency and accountability of data re-users; highlighting in an accessible way important statistical things) – but the idea of some learning resources also sounds fantastic.
Been trying to work out how to communicate some of the ideas around open data in one-pager form (and struggling whilst far to embroiled in academic writing) – but would be up for maybe having another go at some of that…; perhaps developing some resources could tie in as a fringe of OKF’s Open Gov Data Camp in November: http://lists.okfn.org/pipermail/open-government/2010-July/000087.html
Tim
I definitely agree with you on the need for data literacy, but that’s not going to be an overnight job. In the mean time, having direct access to the data behind an app at least means that, with sufficient effort, you can investigate yourself what ‘angle’ the creator of an app has taken and make your own alternative view if you want. (Or get one’s more data literate mate to do it!)
I teach some statistical literacy both as part of my MA Online Journalism and training with journalists (I’ll be spending 2 days with Telegraph journalists in September, for example). As part of that I’ll be hoping to put together some online resources. For the moment, the part of this book chapter that deals with interrogating data has some tips: http://onlinejournalismblog.com/tag/data-journalism
Fantastic to hear that Paul – I wonder how easily that could be adapted for a local government audience?
Knowing what assumptions are built into the depiction of a particular data set is also relevant in the context of data published in official reports, especially in cases where data is made available in a raw form and the report includes summary views or tables that re-present parts of, or analyses of, that data in aggregated form.
Where this does happen, I think it would make sense for reports to include queries or formulae that describe how to go from the raw data to the re-presented/reported form so that other people can:
a) replicate the derivation of reported data from raw released data;
b) identify any assumptions contained within those formulae;
c) if there are assumptions, start to play around with them to see whether or not they are sensitive to adjustment.
I’ve blogged a few thoughts around this, along with an example scenario, here:
http://blog.ouseful.info/2010/06/28/so-where-do-the-numbers-in-government-reports-come-from/