Catalog of Socrata datasets available for mounting in Splitgraph
Introduction
The Socrata data platform hosts tens of thousands of government datasets. Governments large and small publish data on crime, permits, finance, healthcare, research, performance, and more for citizens to use.
This data image catalogues all Socrata datasets that are available for mounting in the Splitgraph engine to be used
as inputs to Splitfiles or just queried by any PostgreSQL client. In essence, Splitgraph can act as a PostgreSQL to Socrata connector.
Usage
Each dataset has a unique "four-by-four" (a Socrata dataset ID, for example, 28km-gtjn
) and is hosted on a certain domain,
for example, data.cityofchicago.org
.
The easiest way to search through this catalog is to clone the data image and query it:
$ sgr clone splitgraph/socrata:latest --download-all
$ sgr sql -i splitgraph/socrata:latest \
"SELECT domain, id, name FROM datasets WHERE name ILIKE '%covid%'"
data.austintexas.gov 4p54-9544 Austin Code COVID-19 Complaint Cases
data.calgary.ca uq24-jkwv COVID-19 Requests
data.cambridgema.gov 4nyp-vuze OLD - Confirmed Cambridge COVID-19 Cases - OLD
data.cambridgema.gov inw8-ircw Confirmed COVID-19 Cases in Cambridge
data.cambridgema.gov tdt9-vq5y COVID-19 Cumulative Cases by Date
data.cdc.gov 9bhg-hcku Provisional COVID-19 Death Counts by Sex, Age, and State
data.cdc.gov b58h-s9zx Provider Relief Fund COVID-19 High-Impact Payments
data.cdc.gov hc4f-j6nb Provisional Death Counts for Coronavirus Disease (COVID-19)
data.cdc.gov hk9y-quqm Conditions contributing to deaths involving coronavirus disease 2019 (COVID-19), by age group, United States.
data.cdc.gov kn79-hsxy Provisional COVID-19 Death Counts in the United States by County
...
Note the domain and the Socrata ID and use the sgr
client to mount the dataset:
sgr mount socrata chicago_data \
--handler-options '{"domain": "data.cityofchicago.org", \
"tables": {"fire_stations": "28km-gtjn"}, \
"app_token": "YOUR_APP_TOKEN"}'
This will create a table fire_stations
in chicago_data
schema on your engine that, when queried,
will rewrite requests into SoQL queries and forward them to the Socrata server,
querying the relevant dataset. In this case, it will query the Chicago Fire Stations dataset.
The table schema and other metadata will be discovered automatically from the Socrata API.
The app token is optional but requests without it are anonymous and can be throttled. See the Socrata API reference on how to get an app token.
Finally, the tables
field is optional as well. Without it, Splitgraph will mount all datasets provided
by that domain as foreign tables, giving the tables human-readable names consisting of the Socrata dataset name and the Socrata dataset ID, for example, building_violations_22u3_xenr
. This will not download any actual data from Socrata but will let you explore all data on that domain in any PostgreSQL client.
Full mounting takes a few seconds on a domain that serves up 500 datasets.
Licensing and contact
Socrata datasets aren't hosted by Splitgraph. Check the description
field in the datasets
table and other
metadata fields for contact information and the dataset's license.
Source
Socrata Discovery API at http://api.us.socrata.com/api/catalog/v1. See the API docs for reference.