Query the Data Delivery Network
Query the DDNThe easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire protocol. Any Splitgraph user can connect to it at data.splitgraph.com:5432
and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.
For example, you can query the dig_ticket_notifications
table in this repository, by referencing it like:
"cityofchicago/dig-ticket-notifications-cygx-ui4j:latest"."dig_ticket_notifications"
or in a full query, like:
SELECT
":id", -- Socrata column ID
"dig_ticket_", -- Dig ticket number which can be reviewed through the 811 user portal.
"expirationdate", -- The date and time at which the dig ticket expires.
"direction", -- The direction component of the street where the excavation will occur.
"notice_history_site_id", -- The unique site number associated with each utility owner. The 811 system identifies utility owners by their site number.
"location", -- The location of the midpoint of the address range in a format that allows for creation of maps and other geographic operations on this data portal.
"placement", -- The location within the public right of way where the excavation will occur. Excavations on private property do not have placement data associated with them.”
":@computed_region_vrxf_vc4k",
":@computed_region_6mkv_f3dw",
"laststatustime", -- The last time the ticket was updated. This can be either an update from the City of Chicago to a utility or a response from the utility.
"notice_history_status", -- Utility owners respond to the ticket via the 811 system using the following statuses. All utilities should at a minimum respond "acknowledged." The use of the other status codes is a company by company decision. a. “Acknowledged” - a notification from the utility owner that they received the dig ticket through their web service. This is the most common status and often is not updated. b. “Marked” - a notification from the utility owner that they marked the location of their facility at the address on the dig ticket. c. “In Route” - a notification from the utility owner that their locator is on the way to the address on the dig ticket. d. “Positive Response / Clear” - a notification from the utility owner that they do not have facilities near the address on the dig ticket. e. “Unacknowledged” - a notification from the system that the utility owner has not acknowledged receiving the dig ticket.
"permit_", -- The permit associated with the dig ticket if one exists.
":@computed_region_rpca_8um6",
"longitude", -- The longitude of the midpoint of the address range.
"requestdate", -- The date and time on which the dig request was made.
":@computed_region_bdys_3d7i",
"suffix", -- The suffix of the street where the excavation will occur.
":@computed_region_43wa_7qmu",
"last_member_transmission_flag", -- The most recent record for each utility associated with a unique ticket. The most recent record for each utility on a dig ticket will be marked True and all others will be marked False.
"stname", -- The name of the street where the excavation will occur.
"emergency", -- Indicates whether dig ticket was an emergency request.
"notice_history_type", -- The type of ticket issued including: a. “Initial ticket” - the first ticket issued. Utility owners have 48 hours (recorded as the “DIGDATE”) to mark the location of their facilities at the address on the ticket. b. “No-show” - a follow-up request associated with an initial ticket that notifies specific utilities that their facility was not marked by the dig date. c.“Remark request” - a follow-up request associated with an initial ticket that notifies specific utilities that they need to put new location marks at the site. d. “Cancellation Notice” - a notice to the utility owners that the dig ticket has been cancelled. e. “Notification of renewal” - a notice to the utility owners that the dig ticket expiration date has been extended. f. “Update ticket” - a notice to the utility owners that information on the dig ticket has been updated.
"stnofrom", -- The start of the street address where the excavation will occur. The 811 system creates separate dig tickets for address ranges by breaking them into even and odd blocks.
"primarycontactfirst", -- The first name of the individual or company requesting the dig ticket.
"notice_history_created", -- The date and time when the notice was sent to the utility.
"latitude", -- The latitude of the midpoint of the address range.
"notice_history_utility_name", -- The name of the utility company notified. Each dig ticket is sent to one or more utility owners. Thus, the dig ticket number as well as most of the data associated with the dig ticket will be repeated for each utility owner that was notified under that ticket.
"primarycontactlast", -- The last name of the individual or company requesting the dig ticket.
"digdate", -- The date and time before which the excavator is not permitted to break ground.
"stnoto", -- The end of the street address where the excavation will occur. The 811 system creates separate dig tickets for address ranges by breaking them into even and odd blocks.
"uniquekey" -- A unique number that differentiates each record in the data set.
FROM
"cityofchicago/dig-ticket-notifications-cygx-ui4j:latest"."dig_ticket_notifications"
LIMIT 100;
Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query cityofchicago/dig-ticket-notifications-cygx-ui4j
with SQL in under 60 seconds.
Query Your Local Engine
bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"
Read the installation docs.
Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.
It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr
, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgr
can manage the image, container and volume for you.
There are a few ways to ingest data into the local engine.
For external repositories, the Splitgraph Engine can "mount" upstream data sources by using sgr mount
. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.
For hosted datasets (like this repository), where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr clone
and sgr checkout
.
Cloning Data
Because cityofchicago/dig-ticket-notifications-cygx-ui4j:latest
is a Splitgraph Image, you can clone the data from Spltgraph Cloud to your local engine, where you can query it like any other Postgres database, using any of your existing tools.
First, install Splitgraph if you haven't already.
Clone the metadata with sgr clone
This will be quick, and does not download the actual data.
sgr clone cityofchicago/dig-ticket-notifications-cygx-ui4j
Checkout the data
Once you've cloned the data, you need to "checkout" the tag that you want. For example, to checkout the latest
tag:
sgr checkout cityofchicago/dig-ticket-notifications-cygx-ui4j:latest
This will download all the objects for the latest
tag of cityofchicago/dig-ticket-notifications-cygx-ui4j
and load them into the Splitgraph Engine. Depending on your connection speed and the size of the data, you will need to wait for the checkout to complete. Once it's complete, you will be able to query the data like you would any other Postgres database.
Alternatively, use "layered checkout" to avoid downloading all the data
The data in cityofchicago/dig-ticket-notifications-cygx-ui4j:latest
is 0 bytes. If this is too big to download all at once, or perhaps you only need to query a subset of it, you can use a layered checkout.:
sgr checkout --layered cityofchicago/dig-ticket-notifications-cygx-ui4j:latest
This will not download all the data, but it will create a schema comprised of foreign tables, that you can query as you would any other data. Splitgraph will lazily download the required objects as you query the data. In some cases, this might be faster or more efficient than a regular checkout.
Read the layered querying documentation to learn about when and why you might want to use layered queries.
Query the data with your existing tools
Once you've loaded the data into your local Splitgraph Engine, you can query it with any of your existing tools. As far as they're concerned, cityofchicago/dig-ticket-notifications-cygx-ui4j
is just another Postgres schema.