Skip to content

We will be performing updates to data.govt.nz on Thursday 12 Jun between 10am and 3pm . During this time, you may experience issues accessing the site.

You can continue to use catalogue.data.govt.nz to search for datasets. If there are any concerns please contact info@data.govt.nz.

Using the ckanapi-exporter tool you can extract all dataset metadata into a single CSV file.

Requirements

Steps

  1. Install requirements
  2. Create a columns.json file (sets out some preset data propoerties to extract from data.govt.nz API, you can customise if required). See below for file contents.
  3. Run the below command on your terminal
ckanapi-exporter --url 'https://catalogue.data.govt.nz' --columns columns.json > datasets.csv

columns.json

{
 "Title": {
     "pattern": "^title$"
 },
 "Agency": {
     "pattern": ["^organization$", "^title$"]
 },
 "URL": {
     "pattern": "^url$"
 },
 "CatalogueCreated": {
     "pattern": "^metadata_created$",
     "max_length": 10
 },
 "CatalogueLastUpdated": {
     "pattern": "^metadata_modified$",
     "max_length": 10
 },
 "DatasetCreated": {
     "pattern": "^issued$",
     "max_length": 10
 },
 "DatasetLastUpdated": {
     "pattern": "^modified$",
     "max_length": 10
 },
 "FrequencyOfUpdate": {
     "pattern": "^frequency_of_update$"
 },
 "Rights": {
     "pattern": "^license_title$"
 },
 "FormatsAvailable": {
     "pattern": ["^resources$", "^format$"],
     "case_sensitive": true,
     "deduplicate": true
 },
 "Description": {
     "pattern": "^notes$"
 },
 "Tags": {
   "pattern": ["^tags$", "^display_name$"]
 },
 "Groups": {
     "pattern": ["^groups$", "^display_name$"]
 },
 "AgencyContact": {
     "pattern": "^author$"
 },
 "AgencyContactEmail":{
     "pattern": "^author_email$"
 },
 "AgencyContactPhone":{
     "pattern": "^author_phone$"
 },
 "DatasetContact": {
     "pattern": "^maintainer$"
 },
 "DatasetContactEmail": {
     "pattern": "^maintainer_email$"
 },
 "DatasetContactPhone": {
     "pattern": "^maintainer_phone$"
 },
 "PermanentIdentifier":{
     "pattern": "^id$"
 },
 "SourceIdentifier": {
     "pattern": "^source_identifier$"
 }
}

Top