My attention was caught the other day by an article in The Register: “Data.gov.uk chief admits transparency concerns”
The head of the government’s website for the release of public sector data has said it is a challenge to ensure that users can understand the statistics.
Cabinet Office official Richard Stirling, who leads the team that runs Data.gov.uk, said that if he was at the Office for National Statistics he would have concerns about statistical releases and people making assumptions “that aren’t quite valid”.
The article was based on a podcast interview with Richard, and in typical journalistic style, took one part of his message and ignored the rest. To get the full picture, listen to the original audio.
Bearing all this in mind, though, I do think this is an important issue which probably needs to be explored more thoroughly than it already exists.
To use myself as an example: I’m a geek, and I like computers, the internet *and* I find government interesting. I suspect this puts me into a very small percentage of the population. But even then – other than thinking open government data is almost certainly a good thing, and being able to reel off all the arguments around transparency and improving services – I don’t really understand or know how this happens. I am completely data illiterate.
This takes two forms. Firstly, knowing what data is, what format it is in and what can be done with it. Essentially a techie thing – fine, the data is there, but how do I do anything with it? This is probably the least important problem, because producing apps and mashups probably isn’t something that everybody needs to be able to do.
The second form is more important, though, and that is based in statistical awareness, understanding of how data is manipulated, and a grip of the context within which the original data was published.
In other words, if I come across some nifty app using open government data, how do I know what biases the developer had? Who – if anyone – paid them to do this? How can I check that the results it produces are correct?
Because even though the original data is published openly, and I can check that, the chances are I will not understand the relationship between that and the nifty app in question.
There’s always the argument that it was ever thus – not that it is a particularly good argument – but when statistical analysis appears in a newspaper, for example, most people are aware of the biases of those publications.
Don’t get me wrong – releasing data is important. But the technical challenges are of course the easy bit, whether sticking a CSV file on a web page or creating an API. What I am talking about is ensuring a reasonable level of data literacy for people at the receiving end.
Hadley Beeman’s project could well be something that could answer some of these issues, by providing a space for data to be stored, converted to a common format and appropriately annotated (assuming I have understood it correctly!).
Another possibility is a book being written in Canada, or the Straight Statistics site which seems full of good information (thanks to Simon for the tip).
But none of these seems to scratch the data literacy itch, really. We need interesting, well written, engaging content to help people get to a level where they can understand the process and context of open data. Might it even involve e-learning? It could do.