Query the Data Delivery Network
Query the DDNThe easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire protocol. Any Splitgraph user can connect to it at data.splitgraph.com:5432
and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.
For example, you can query the traffic_signals_and_pedestrian_signals
table in this repository, by referencing it like:
"datahub-austintexas-gov/traffic-signals-and-pedestrian-signals-p53x-x73x:latest"."traffic_signals_and_pedestrian_signals"
or in a full query, like:
SELECT
":id", -- Socrata column ID
"detection_status", -- The current state of vehicle detection at the traffic signal.
"coa_intersection_id", -- Unique identifier of the intersection at which the signal is located. Defined by City of Austin intersection points GIS layer.
"council_district", -- The council district(s) in which the signal resides. If the signal sits along the border of one more districts, the district values are comma-separated.
"cross_st_segment_id", -- The unique street segment identifier of the cross street at which the signal is located. Derived from the city's street centerline GIS dataset.
"intersection_backplate_status", -- An indicator of whether the backplate installation is partial or complete
"n_s_ped", -- The phase number for north/south pedestrian crossing if Leading Pedestrian Intervals exist.
"detection_status_date", -- The date at which the signal's detection status was last updated.
"ped_button_type", -- Pedestrian button type.
"leading_pedestrian_interval", -- The presence of pedestrian walk time prior to green light for adjacent vehicles.
"backplate_installed", -- Yes/No indicator of whether a backplate has been installed
"construction_note_date", -- The date of the engineer note regarding signal construction.
"primary_st_block", -- The signalized intersection block number of the primary street.
"control", -- Indicates if the signal acts a primary controller or is secondary to another signal controller.
"corridor_retiming_zone", -- The corridor retiming zone of the signal for this row
"signal_id", -- Unique identifier of the traffic or pedestrian signal.
"primary_st_segment_id", -- The unique street segment identifier of the primary street at which the signal is located. Derived from the city's street centerline GIS dataset.
"cross_st_block", -- The signalized intersection block number of the cross street.
"signal_eng_area", -- The engineering area in which the signal is located. Used to group signals in geographic regions overseen by the city's traffic signal engineers.
"lpi_comments", -- The comment from signal engineer about the Leading Pedestrian Interval.
"modified_date", -- The datetime the work order record was last modified.
"landmark", -- Description of one or more landmarks at the signal location.
"comm_status_datetime_utc", -- The date at which the controller IP communication status was updated.
"traffic_eng_area", -- The engineering area in which the signal resides.
"signal_pm_max_fiscal_year", -- The most recent fiscal year in which preventative maintenance was conducted at the signal. The fiscal year begins on October 1st.
"cross_st", -- The cross street name of the signal location.
"dark_signal", -- Indicator of whether a signal has power
"ip_comm_status", -- The IP communication status of the signal controller.
"primary_st", -- The primary street name of the signal location.
":@computed_region_m2th_e4b7", -- This column was automatically created in order to record in what polygon from the dataset 'Community Registry' (m2th-e4b7) the point in column 'location' is located. This enables the creation of region maps (choropleths) in the visualization canvas and data lens.
"jurisdiction_label", -- The municipal jurisdiction in which the traffic signal is located. Jurisdiction is determined by the location of the traffic signal cabinet with relation to the city's jurisdiction layer: http://services.arcgis.com/0L95CJ0VTaxqcmED/arcgis/rest/services/BOUNDARIES_jurisdictions/FeatureServer/0
"backplate_type", -- An indicator of whether the backplate is retroreflective or non-retroreflective
":@computed_region_e9j2_6w3z", -- This column was automatically created in order to record in what polygon from the dataset 'Neighborhood Planning Areas' (e9j2-6w3z) the point in column 'location' is located. This enables the creation of region maps (choropleths) in the visualization canvas and data lens.
":@computed_region_q9nd_rr82", -- This column was automatically created in order to record in what polygon from the dataset 'BOUNDARIES_single_member_districts' (q9nd-rr82) the point in column 'location' is located. This enables the creation of region maps (choropleths) in the visualization canvas and data lens.
":@computed_region_jcrc_4uuy", -- This column was automatically created in order to record in what polygon from the dataset 'Boundaries: Zip Code Tabulation Areas, 2017' (jcrc-4uuy) the point in column 'location' is located. This enables the creation of region maps (choropleths) in the visualization canvas and data lens.
":@computed_region_8spj_utxs", -- This column was automatically created in order to record in what polygon from the dataset 'Single Member Council Districts' (8spj-utxs) the point in column 'location' is located. This enables the creation of region maps (choropleths) in the visualization canvas and data lens.
"e_w_ped", -- The phase number for east/west pedestrian crossing if Leading Pedestrian Intervals exist.
":@computed_region_rxpj_nzrk", -- This column was automatically created in order to record in what polygon from the dataset 'Zipcodes' (rxpj-nzrk) the point in column 'location' is located. This enables the creation of region maps (choropleths) in the visualization canvas and data lens.
"id", -- The signal's unique identifier in the asset management system database.
"signal_status", -- Operational status of the traffic signal.
"location_name", -- Canonical name of the location at which the traffic signal is located.
"atd_location_id", -- Unique location identifier of the signal. ATD Location IDs persist throughout data maintained by Austin Transportation, and can be used to identify assets with share a common location. E.g., traffic cameras, signals, and sensors.
"secondary_signals", -- The unique ID(s) of any signals that are controlled by this primary signal.
"signal_type", -- The type of signal. Either traffic or PHB (pedestrian hybrid beacon).
"location", -- The coordinates of the signalized intersection.
"owner", -- The transportation agency that owns the traffic signal. The City of Austin maintains signals for a number of agencies and municipalties, including the Texas Department of Transportation (TxDOT), Travis County, and the City of Sunset Valley.
"construction_note", -- Engineer note regarding signal construction.
"bus_stops", -- Capital Metro bus stops within 350ft of the signal location.
"bike_signal", -- The presence of an exclusive bicycle signal.
"date_lpi_installed", -- The approximate date that the leading pedestrian interval was installed.
"turn_on_date", -- The date that the signal was turned on.
"primary_signal" -- The name of the signal's primary signal controller, if applicable.
FROM
"datahub-austintexas-gov/traffic-signals-and-pedestrian-signals-p53x-x73x:latest"."traffic_signals_and_pedestrian_signals"
LIMIT 100;
Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query datahub-austintexas-gov/traffic-signals-and-pedestrian-signals-p53x-x73x
with SQL in under 60 seconds.
Query Your Local Engine
bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"
Read the installation docs.
Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.
It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr
, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgr
can manage the image, container and volume for you.
There are a few ways to ingest data into the local engine.
For external repositories, the Splitgraph Engine can "mount" upstream data sources by using sgr mount
. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.
For hosted datasets (like this repository), where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr clone
and sgr checkout
.
Cloning Data
Because datahub-austintexas-gov/traffic-signals-and-pedestrian-signals-p53x-x73x:latest
is a Splitgraph Image, you can clone the data from Spltgraph Cloud to your local engine, where you can query it like any other Postgres database, using any of your existing tools.
First, install Splitgraph if you haven't already.
Clone the metadata with sgr clone
This will be quick, and does not download the actual data.
sgr clone datahub-austintexas-gov/traffic-signals-and-pedestrian-signals-p53x-x73x
Checkout the data
Once you've cloned the data, you need to "checkout" the tag that you want. For example, to checkout the latest
tag:
sgr checkout datahub-austintexas-gov/traffic-signals-and-pedestrian-signals-p53x-x73x:latest
This will download all the objects for the latest
tag of datahub-austintexas-gov/traffic-signals-and-pedestrian-signals-p53x-x73x
and load them into the Splitgraph Engine. Depending on your connection speed and the size of the data, you will need to wait for the checkout to complete. Once it's complete, you will be able to query the data like you would any other Postgres database.
Alternatively, use "layered checkout" to avoid downloading all the data
The data in datahub-austintexas-gov/traffic-signals-and-pedestrian-signals-p53x-x73x:latest
is 0 bytes. If this is too big to download all at once, or perhaps you only need to query a subset of it, you can use a layered checkout.:
sgr checkout --layered datahub-austintexas-gov/traffic-signals-and-pedestrian-signals-p53x-x73x:latest
This will not download all the data, but it will create a schema comprised of foreign tables, that you can query as you would any other data. Splitgraph will lazily download the required objects as you query the data. In some cases, this might be faster or more efficient than a regular checkout.
Read the layered querying documentation to learn about when and why you might want to use layered queries.
Query the data with your existing tools
Once you've loaded the data into your local Splitgraph Engine, you can query it with any of your existing tools. As far as they're concerned, datahub-austintexas-gov/traffic-signals-and-pedestrian-signals-p53x-x73x
is just another Postgres schema.