Query the Data Delivery Network

Query the DDN

The easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire protocol. Any Splitgraph user can connect to it at data.splitgraph.com:5432 and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.

For example, you can query the 811_chicago_utility_hit_tickets table in this repository, by referencing it like:

"cityofchicago/811-chicago-utility-hit-tickets-nphj-5zur:latest"."811_chicago_utility_hit_tickets"

or in a full query, like:

SELECT
    ":id", -- Socrata column ID
    ":@computed_region_43wa_7qmu", -- This column was automatically created in order to record in what polygon from the dataset 'Wards' (43wa-7qmu) the point in column 'location' is located.  This enables the creation of region maps (choropleths) in the visualization canvas and data lens.
    "facility_hit", -- The type of facility that was damaged as described by the individual or entity reporting the hit. The accuracy and reliability of data in this field is subject to the reporter’s perception. Cells with the value “Unknown” means the individual or entity reporting the hit was unable to determine what type of facility was damaged.
    "street_number", -- The beginning of the address range of the location where the damage occurred. If the “street number” and street number high” fields are different, the individual or entity reporting the damage stated the damage facility spans two or more addresses; if the street number” and "street number high” fields are the same, or the field is null, the damage was reported at a single address. Of note, hit tickets (and dig tickets) are processed by block and side of the street. For example, a hit ticket with an odd street address indicates the southside of an east / west street, or the eastside of a north / south street. The address range on the hit ticket cannot exceed the first and last address of the block.
    ":@computed_region_6mkv_f3dw", -- This column was automatically created in order to record in what polygon from the dataset 'Zip Codes' (6mkv-f3dw) the point in column 'location' is located.  This enables the creation of region maps (choropleths) in the visualization canvas and data lens.
    ":@computed_region_vrxf_vc4k", -- This column was automatically created in order to record in what polygon from the dataset 'Community Areas' (vrxf-vc4k) the point in column 'location' is located.  This enables the creation of region maps (choropleths) in the visualization canvas and data lens.
    "hit_date", -- The date and time the individual or entity reporting the hit stated the hit occurred. 
    ":@computed_region_rpca_8um6", -- This column was automatically created in order to record in what polygon from the dataset 'Boundaries - ZIP Codes' (rpca-8um6) the point in column 'location' is located.  This enables the creation of region maps (choropleths) in the visualization canvas and data lens.
    "reported_by_name", -- The name of the individual or entity which reported the damage to 811 Chicago. 
    "facility_owner", -- The name of the organization which owns the facility that was damaged as described by the individual or entity reporting the hit. The accuracy and reliability of data in this field is subject to the reporter’s perception. Cells with a null value mean the individual or entity reporting the hit was unable to determine who owned the damaged facility. 
    "location", -- The location of the Street Address column in a format that allows for mapping and other geographic analysis on this portal.
    "site_id", -- The unique site number associated with each utility owner’s shapefile stored in the 811 Chicago database. 
    "hit_report_no", -- The identifier given to each hit report. Users can look up the hit report using this number on the 811 Chicago website. While this number is unique to the hit report, it is repeated in this data set because there can be multiple combinations of Utility Name, Site ID, and Hit Report Status for a hit report.
    "notice_type", -- This field is a flag for utility owners (who are notified of a potential damage to their facility via electronic transmission of the hit ticket data) that signifies the iteration of the ticket. As it relates to hit tickets, the notice type could be one of three types: “Initial ticket,” or the first iteration of the hit ticket; “Cancellation notice,” or 811 Chicago has canceled the ticket because it is incorrect; “Update ticket,” or 811 Chicago has updated information on the ticket and is retransmitting it. Notice type is a more robust field for “dig tickets.” Please see the “Dig Ticket Notification” dataset.
    "hit_by_name", -- The name of the individual or entity which damaged the facility according to the individual or entity that reported the damage. The accuracy and reliability of data in this field is subject to the reporter’s perception. Cells with a null value mean the individual or entity reporting the hit was unable to determine who damaged the facility. Please note, the inclusion of an individual or entity in this field did not necessarily mean that entity caused the damage or that a violation of MCC § 10-21 occurred. 811 Chicago’s investigation will determine who caused the damage. 811 Chicago may not update the hit report if the “hit by” field is deemed incorrect. 
    "street_address", -- The full street address, using the low street number if there is a range.
    ":@computed_region_bdys_3d7i", -- This column was automatically created in order to record in what polygon from the dataset 'Census Tracts' (bdys-3d7i) the point in column 'location' is located.  This enables the creation of region maps (choropleths) in the visualization canvas and data lens.
    "utility_name", -- The name of the utility company(s) to which the hit ticket was transmitted. Hit reports may be sent to more than one utility depending on the facility that was damaged. For example, Peoples Gas has three “sites” (see “Site ID” field description) in the 811 Chicago database. Depending on the location of the damage, 811 Chicago may transmit the ticket to all three of their sites. If a “unknown” (see “Facility Hit” and “Facility Owner” field descriptions) facility is damaged, 811 Chicago may transmit the ticket to some or all facility owners determined to have utilities in the area of the damage in hopes of identifying the facility. Once 811 Chicago transmits the ticket, the utility owner processes the ticket according to their business rules.  
    "street_number_high", -- The end of the address range of the location where the damage occurred.
    "direction", -- The direction portion of the street address where the damage occurred.
    "street_name", -- The street where the damage occurred.
    "related_dig_ticket", -- Any individual or entity excavating in Chicago city limits is required to obtain a “dig ticket” from 811 Chicago prior to breaking ground. For more information on dig tickets, please visit the 811 Chicago site and the “Dig Ticket Notification” dataset. 811 Chicago asks individuals or entities reporting a potential utility damage if there is a dig ticket associated with the excavation activity. 811 Chicago will record that information in this field. A null value in the field does not necessarily mean there is no associated dig ticket.
    "hit_report_status", -- The “status” is based on an action in the workflow. There are six potential statuses: “Canceled,” or the hit report was canceled and is no longer valid; “Complete,” or the 811 investigation workflow was completed; “Duplicate,” or the damage has already been reported on a different hit ticket (please note, 811 Chicago may not identify all duplicate damages); “AutoAppr,” or the hit report was processed without 811 Chicago reviewing the report (e.g., submitted by a user with elevated permissions); “Approved,” or 811 Chicago processed the hit report (though, as explained in this data set’s description, the information in the hit report will not be verified until the investigation); “Rejected,” or 811 Chicago reviewed the hit report but did not process it.
    "suffix" -- The suffix portion of the street name where the damage occurred  
FROM
    "cityofchicago/811-chicago-utility-hit-tickets-nphj-5zur:latest"."811_chicago_utility_hit_tickets"
LIMIT 100;

Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query cityofchicago/811-chicago-utility-hit-tickets-nphj-5zur with SQL in under 60 seconds.

Query Your Local Engine

Install Splitgraph Locally
bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"
 

Read the installation docs.

Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.

It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgrcan manage the image, container and volume for you.

There are a few ways to ingest data into the local engine.

For external repositories, the Splitgraph Engine can "mount" upstream data sources by using sgr mount. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.

For hosted datasets (like this repository), where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr cloneand sgr checkout.

Cloning Data

Because cityofchicago/811-chicago-utility-hit-tickets-nphj-5zur:latest is a Splitgraph Image, you can clone the data from Spltgraph Cloud to your local engine, where you can query it like any other Postgres database, using any of your existing tools.

First, install Splitgraph if you haven't already.

Clone the metadata with sgr clone

This will be quick, and does not download the actual data.

sgr clone cityofchicago/811-chicago-utility-hit-tickets-nphj-5zur

Checkout the data

Once you've cloned the data, you need to "checkout" the tag that you want. For example, to checkout the latest tag:

sgr checkout cityofchicago/811-chicago-utility-hit-tickets-nphj-5zur:latest

This will download all the objects for the latest tag of cityofchicago/811-chicago-utility-hit-tickets-nphj-5zur and load them into the Splitgraph Engine. Depending on your connection speed and the size of the data, you will need to wait for the checkout to complete. Once it's complete, you will be able to query the data like you would any other Postgres database.

Alternatively, use "layered checkout" to avoid downloading all the data

The data in cityofchicago/811-chicago-utility-hit-tickets-nphj-5zur:latest is 0 bytes. If this is too big to download all at once, or perhaps you only need to query a subset of it, you can use a layered checkout.:

sgr checkout --layered cityofchicago/811-chicago-utility-hit-tickets-nphj-5zur:latest

This will not download all the data, but it will create a schema comprised of foreign tables, that you can query as you would any other data. Splitgraph will lazily download the required objects as you query the data. In some cases, this might be faster or more efficient than a regular checkout.

Read the layered querying documentation to learn about when and why you might want to use layered queries.

Query the data with your existing tools

Once you've loaded the data into your local Splitgraph Engine, you can query it with any of your existing tools. As far as they're concerned, cityofchicago/811-chicago-utility-hit-tickets-nphj-5zur is just another Postgres schema.

Related Documentation:

Loading...