These are some of the questions that government employees posted up on a whiteboard at the start of the ‘Open Data in a Day, for Government’ workshop last week.
The workshop was organised by Stats NZ and DIA and facilitated by Open Data Institute trainer Ellen Broad.
Ellen started by defining open data as “data that anyone can access, use or share”.
It is:
and has an open licence.
Open government data:
Datasets with personal information, third-party IP and commercially or culturally sensitive information often cannot be open.
However, sometimes datasets can be modified. In New Zealand, where personal information is “information about an identifiable individual”, data can sometimes be ‘de-identified’ or ‘anonymised’ to ensure individual privacy is maintained.
This process has to be undertaken with care. Making it impossible to identify individual people requires more than just removing identifying information such as names.
Anonymising data safely can be hard while retaining its usability.
For datasets that contain personal information, Ellen recommends that agencies use the resources of the Office of the Privacy Commissioner to set up a Privacy Impact Assessment.
Ellen noted that some sectors may be bound by more than one set of privacy laws (eg, health, taxation, statistics).
Ellen believes that licensing open data sources is important. It’s what makes open data ‘open’.
A licence provides clarity: it sets out what users and re-users can and can’t do with the dataset.
The default standard for open government data licences in New Zealand is a Creative Commons (v.4) attribution licence (CC BY) – these are set out in NZGOAL.
Ellen pointed out that only CC-BY and CC BY-SA are open licences within the Creative Commons suite of licences.
“An open licence allows both commercial and non-commercial use, and allows you to alter the dataset – so you can mashup etc,” she said.
‘Share alike’ means that if you use an open data source, then you make whatever you publish or distribute open, under the same licence conditions.
Some licences require users of the dataset to register. Ellen says that while this can be implemented as a barrier, encouraging some feedback loop with publishers can be good (for example, to notify an error or update).
This building of relationships is important. Communication works both ways – and one good data management practice is to provide a (real, with a person behind it!) email address and other contact details so users can ask further questions.
There are some useful tools you can use when publishing open data.