splitgraph/socrata

Catalog of over 40000 open Socrata datasets available for querying through Splitgraph

Catalog of Socrata datasets available for mounting in Splitgraph

Introduction

The Socrata data platform hosts tens of thousands of government datasets. Governments large and small publish data on crime, permits, finance, healthcare, research, performance, and more for citizens to use.

This data image catalogues all Socrata datasets that are available for mounting in the Splitgraph engine to be used as inputs to Splitfiles or just queried by any PostgreSQL client. In essence, Splitgraph can act as a PostgreSQL to Socrata connector.

Usage

Each dataset has a unique "four-by-four" (a Socrata dataset ID, for example, 28km-gtjn) and is hosted on a certain domain, for example, data.cityofchicago.org.

The easiest way to search through this catalog is to clone the data image and query it:

$ sgr clone splitgraph/socrata:latest --download-all
$ sgr sql -i splitgraph/socrata:latest \
    "SELECT domain, id, name FROM datasets WHERE name ILIKE '%covid%'"

data.austintexas.gov  4p54-9544  Austin Code COVID-19 Complaint Cases
data.calgary.ca       uq24-jkwv  COVID-19 Requests
data.cambridgema.gov  4nyp-vuze  OLD - Confirmed Cambridge COVID-19 Cases - OLD
data.cambridgema.gov  inw8-ircw  Confirmed COVID-19 Cases in Cambridge
data.cambridgema.gov  tdt9-vq5y  COVID-19 Cumulative Cases by Date
data.cdc.gov          9bhg-hcku  Provisional COVID-19 Death Counts by Sex, Age, and State
data.cdc.gov          b58h-s9zx  Provider Relief Fund COVID-19 High-Impact Payments
data.cdc.gov          hc4f-j6nb  Provisional Death Counts for Coronavirus Disease (COVID-19)
data.cdc.gov          hk9y-quqm  Conditions contributing to deaths involving coronavirus disease 2019 (COVID-19), by age group, United States.
data.cdc.gov          kn79-hsxy  Provisional COVID-19 Death Counts in the United States by County
...

Note the domain and the Socrata ID and use the sgr client to mount the dataset:

sgr mount socrata chicago_data \
    --handler-options '{"domain": "data.cityofchicago.org", \
                        "tables": {"fire_stations": "28km-gtjn"}, \
                        "app_token": "YOUR_APP_TOKEN"}'

This will create a table fire_stations in chicago_data schema on your engine that, when queried, will rewrite requests into SoQL queries and forward them to the Socrata server, querying the relevant dataset. In this case, it will query the Chicago Fire Stations dataset.

The table schema and other metadata will be discovered automatically from the Socrata API.

The app token is optional but requests without it are anonymous and can be throttled. See the Socrata API reference on how to get an app token.

Finally, the tables field is optional as well. Without it, Splitgraph will mount all datasets provided by that domain as foreign tables, giving the tables human-readable names consisting of the Socrata dataset name and the Socrata dataset ID, for example, building_violations_22u3_xenr. This will not download any actual data from Socrata but will let you explore all data on that domain in any PostgreSQL client.

Full mounting takes a few seconds on a domain that serves up 500 datasets.

Licensing and contact

Socrata datasets aren't hosted by Splitgraph. Check the description field in the datasets table and other metadata fields for contact information and the dataset's license.

Source

Socrata Discovery API at http://api.us.socrata.com/api/catalog/v1. See the API docs for reference.