Query the Data Delivery Network
Query the DDNThe easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire protocol. Any Splitgraph user can connect to it at data.splitgraph.com:5432
and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.
For example, you can query the new_york_hazardous_waste_manifest_data_2006_2018
table in this repository, by referencing it like:
"ny-gov/new-york-hazardous-waste-manifest-data-2006-2018-ys8p-aqus:latest"."new_york_hazardous_waste_manifest_data_2006_2018"
or in a full query, like:
SELECT
":id", -- Socrata column ID
"unit_of_measure", -- Unit of Measure - the unit of measure for the Waste Quantity. The unit of measure codes are defined as follows: G - Gallons (liquids only); K - Kilograms (2.2 pounds); L - Liters (liquids only); M - Metric tons (1,000 kilograms) (2,200 pounds); N - Cubic meters*; P – Pounds; T - Tons (2,000 pounds); Y - Cubic yards
"export_ind", -- Export Indicator- a value of "Y" indicates that the waste shipment is being exported from the United States of America to another country. A value of "N" indicates that the waste shipment is not being exported from the United States of America to another country.
"transporter_1_rcra_id", -- Transporter RCRA Identification Number 1 – the alphanumeric ID number assigned to the first hazardous waste transporter named on the manifest. A value of “Unknown” means that an ID number was not provided. A value of “Unreadable” means that the provided ID number was illegible.
"discr_type_ind", -- Discrepancy Type Indicator- Indication that the Type Discrepancy checkbox has been checked on the manifest. A value of "Y" indicates that the box has been checked because there was a discrepancy. A value of "N" indicates that the box has not been checked. A type discrepancy means that the type of waste indicated on the manifest did not match the type of waste that arrived at the TSDF facility.
"gen_sign_date", -- Generator Signature Date - the date upon which the generator signed the manifest and released custody of their waste shipment to the transporter.
"alt_facility_rcra_id", -- Alternate Facility RCRA Identification Number- the alphanumeric ID number assigned to an alternate hazardous waste treatment, storage, or disposal facility (TSDF) named on the manifest or the original generator of the waste. In cases where the original TSDF cannot accept a particular waste shipment, an alternate TSDF facility may be added to the manifest and will receive the waste for treatment, storage, or disposal. The original TSDF can also send the waste back to the generator of the waste. A blank value means that a value was not entered into the Alternative Facility box on the manifest because the waste shipment was accepted by the original TSDF.
"gen_rcra_id", -- Generator RCRA Identification Number - the alphanumeric ID number assigned to the hazardous waste generator named on the manifest. The generator is the site location that is sending the waste off-site for treatment, storage, or disposal.
"line_item_num", -- Waste Line Item Number - this number indicates the waste line on the manifest. A manifest can have multiple waste lines, one for each individual waste that is part of the same shipment.
"discr_quantity_ind", -- Discrepancy Quantity Indicator- Indication that the Quantity discrepancy checkbox has been checked on the manifest. A value of "Y" indicates that the box has been checked because there was a discrepancy. A value of "N" indicates that the box has not been checked. A quantity discrepancy means that the amount of waste indicated on the manifest did not match the physical amount of waste that arrived at the TSDF facility.
"waste_code_5", -- Waste Code 5-federal or state waste code that describes the waste listed on the waste line. Federal waste codes are found in the following regulations: 40 CFR 261 Subparts C and D (https://www.law.cornell.edu/cfr/text/40/part-261). State waste codes are found in the following regulations: 6 NYCRR 371.3 (https://govt.westlaw.com/nycrr/Document/I4eacc3f5cd1711dda432a117e6e0f345?contextData=%28sc.Default%29&transitionType=Default) and 371.4 (https://govt.westlaw.com/nycrr/Document/I4eacc3f8cd1711dda432a117e6e0f345?contextData=%28sc.Default%29&bhcp=1&transitionType=Default). Most federal and state waste codes are the same, but New York also has state waste codes for certain PCB wastes. Each hazardous waste line must have a waste code in at least one of the six waste code boxes. All applicable waste codes have to be listed on the manifest unless there are more than six applicable waste codes. A blank value means that a value was not entered into the fifth waste code box on the manifest.
"container_type_code", -- Container Type Code - the code used to indicate the type of container that the waste is being shipped in. The container type codes are defined as follows: BA - Burlap, cloth, plastic, or paper bag; CF - Fiber or plastic boxes, cartons, cases; CM - Metal boxes, cartons, cases (including roll-offs); CW - Wooden boxes, cartons, cases; CY – Cylinders; DF - Fiberboard or plastic drums, barrels, kegs; DM - Metal drums, barrels, kegs; DT - Dump trucks; DW - Wooden drums, barrels, kegs; HG - Hopper or gondola cars; TC - Tank cars; TP- Portable tanks; TT - Cargo tank (tank trucks)
"alt_facility_sign_date", -- Alternate Facility Signature Date - the date upon which the Alternate Facility signed the manifest and received the waste shipment. A blank value means that a value was not entered into the Alternative Facility box on the manifest because the waste shipment was accepted by the original TSDF.
"tsdf_rcra_id", -- TSDF RCRA Identification Number-the alphanumeric ID number assigned to the Designated Facility (also known as the original hazardous waste treatment, storage, or disposal facility [TSDF]) named on the manifest. The TSDF is the end facility and receives the waste that is sent for treatment, storage, or disposal.
"tsdf_sign_date", -- TSDF Signature Date - the date upon which the TSDF signed the manifest and received the waste shipment from the final transporter.
"handling_type_code", -- Handling Code - a New York State-specific code that indicates the ultimate fate of the waste. There are four possible codes (l, B, T, R) that are defined as follows: L- Landfill; B - Incineration, heat recovery, burning; T - Chemical, physical, or biological treatment; R - Material recovery of more than 75 percent of the total material
"waste_qty", -- Waste Quantity - the amount of the waste, measured by weight or volume
"transporter_2_rcra_id", -- Transporter RCRA Identification Number 2 - the alphanumeric ID number assigned to the second hazardous waste transporter named on the manifest. A blank value means that a value was not entered into the Transporter 2 box on the manifest. (Optional field if there is only one transporter for a shipment.)
"manifest_tracking_num", -- Manifest Tracking Number-the alphanumeric code assigned to each individual hazardous waste manifest.
"num_of_containers", -- Number of Containers- the number of containers of the waste
"waste_code_3", -- Waste Code 3 -federal or state waste code that describes the waste listed on the waste line. Federal waste codes are found in the following regulations: 40 CFR 261 Subparts C and D (https://www.law.cornell.edu/cfr/text/40/part-261). State waste codes are found in the following regulations: 6 NYCRR 371.3 (https://govt.westlaw.com/nycrr/Document/I4eacc3f5cd1711dda432a117e6e0f345?contextData=%28sc.Default%29&transitionType=Default) and 371.4 (https://govt.westlaw.com/nycrr/Document/I4eacc3f8cd1711dda432a117e6e0f345?contextData=%28sc.Default%29&bhcp=1&transitionType=Default). Most federal and state waste codes are the same, but New York also has state waste codes for certain PCB wastes. Each hazardous waste line must have a waste code in at least one of the six waste code boxes. All applicable waste codes have to be listed on the manifest unless there are more than, six applicable waste codes. A blank value means that a value was not entered into the third waste code box on the manifest.
"mgmt_method_type_code", -- Management Method Code- the method management code describes the type of hazardous waste management system used to treat, recover, or dispose of a hazardous waste. This code cannot always be used to determine the ultimate fate of the waste.
"discr_partial_reject_ind", -- Discrepancy Partial Rejection Indicator- indication that the Partial Rejection checkbox has been checked on the manifest. A value of "Y" indicates that the box has been checked because there was a rejection. A value of "N" indicates that the box has not been checked. A partial rejection means that some of the waste on the manifest could not be accepted by the TSDF facility indicated on the manifest and that waste was either sent back to the generator of the waste or to an alternate facility that could accept the waste.
"transporter_1_sign_date", -- Transporter Signature Date 1- the date upon which the first transporter signed the manifest and received the waste shipment from the generator for transport.
"page_num", -- Page Number - the page number of the manifest associated with the data. Manifests can have multiple continuation sheets (pages) depending upon the amount of waste sent in a single shipment.
"waste_code_1", -- Waste Code 1-federal or state waste code that describes the waste listed on the waste line. Federal waste codes are found in the following regulations: 40 CFR 261 Subparts C and D (https://www.law.cornell.edu/cfr/text/40/part-261). State waste codes are found in the following regulations: 6 NYCRR 371.3 (https://govt.westlaw.com/nycrr/Document/I4eacc3f5cd1711dda432a117e6e0f345?contextData=%28sc.Default%29&transitionType=Default) and 371.4 (https://govt.westlaw.com/nycrr/Document/I4eacc3f8cd1711dda432a117e6e0f345?contextData=%28sc.Default%29&bhcp=1&transitionType=Default). Most federal and state waste codes are the same, but New York also has state waste codes for certain PCB wastes. Each hazardous waste line must have a waste code in at least one of the six waste code boxes. All applicable waste codes have to be listed on the manifest unless there are more than six applicable waste codes.
"discr_full_reject_ind", -- Discrepancy Full Rejection Indicator- indication that the Full Rejection checkbox is checked has been checked on the manifest. A value of "Y" indicates that the box has been checked because there was a rejection. A value of "N" indicates that the box has not been checked. A full rejection means that the TSDF facility could not accept the waste indicated on the manifest and that all of the waste in the shipment was either sent back to the generator of the waste or to an alternate facility that could accept the waste.
"waste_code_4", -- Waste Code 4-federal or state waste code that describes the waste listed on the waste line. Federal waste codes are found in the following regulations: 40 CFR 261 Subparts C and D (https://www.law.cornell.edu/cfr/text/40/part-261). State waste codes are found in the following regulations: 6 NYCRR 371.3 (https://govt.westlaw.com/nycrr/Document/I4eacc3f5cd1711dda432a117e6e0f345?contextData=%28sc.Default%29&transitionType=Default) and 371.4 (https://govt.westlaw.com/nycrr/Document/I4eacc3f8cd1711dda432a117e6e0f345?contextData=%28sc.Default%29&bhcp=1&transitionType=Default). Most federal and state waste codes are the same, but New York also has state waste codes for certain PCB wastes. Each hazardous waste line must have a waste code in at least one of the six waste code boxes. All applicable waste codes have to be listed on the manifest unless there are more than six applicable waste codes. A blank value means that a value was not entered into the fourth waste code box on the manifest.
"manifest_ref_num", -- Manifest Reference Number- the Manifest Reference Number is used to record the manifest tracking number of an additional manifest if an additional manifest is being used to track a rejected waste or a residue shipment from the original manifest. This should only be provided if one of the following boxes is checked on the manifest: Full, Partial, or Residue Rejection.
"waste_code_6", -- Waste Code 6-federal or state waste code that describes the waste listed on the waste line. Federal waste codes are found in the following regulations: 40 CFR 261 Subparts C and D (https://www.law.cornell.edu/cfr/text/40/part-261). State waste codes are found in the following regulations: 6 NYCRR 371.3 (https://govt.westlaw.com/nycrr/Document/I4eacc3f5cd1711dda432a117e6e0f345?contextData=%28sc.Default%29&transitionType=Default) and 371.4 (https://govt.westlaw.com/nycrr/Document/I4eacc3f8cd1711dda432a117e6e0f345?contextData=%28sc.Default%29&bhcp=1&transitionType=Default). Most federal and state waste codes are the same, but New York also has state waste codes for certain PCB wastes. Each hazardous waste line must have a waste code in at least one of the six waste code boxes. All applicable waste codes have to be listed on the manifest unless there are more than six applicable waste codes. A blank value means that a value was not entered into the sixth waste code box on the manifest.
"discr_residue_ind", -- Discrepancy Residue Indicator-indication that the Residue Discrepancy checkbox has been checked on the manifest. A value of "Y" indicates that the box has been checked because residue from waste on that manifest was sent to a-different facility. A value of "N" indicates that the box has not been checked. A residue discrepancy means that residue from a waste indicated on the manifest was sent on to a different TSDF facility.
"waste_code_2", -- Waste Code 2 -federal or state waste code that describes the waste listed on the waste line. Federal waste codes are found in the following regulations: 40 CFR 261 Subparts C and D (https://www.law.cornell.edu/cfr/text/40/part-261). State waste codes are found in the following regulations: 6 NYCRR 371.3 (https://govt.westlaw.com/nycrr/Document/I4eacc3f5cd1711dda432a117e6e0f345?contextData=%28sc.Default%29&transitionType=Default) and 371.4 (https://govt.westlaw.com/nycrr/Document/I4eacc3f8cd1711dda432a117e6e0f345?contextData=%28sc.Default%29&bhcp=1&transitionType=Default). Most federal and state waste codes are the same, but New York also has state waste codes for certain PCB wastes. Each hazardous waste line must have a waste code in at least one of the six waste code boxes. All applicable waste codes have to be listed on the manifest unless there are more than six applicable waste codes. A blank value means that a value was not entered into the second waste code box on the manifest.
"specific_gravity", -- Specific Gravity- the ratio of the density of the waste to the density of water.
"import_ind", -- Import Indicator- a value of "Y" indicates that the waste shipment is being imported from another country into the United States of America. A value of "N" indicates that the waste shipment is not being imported from another country into the United States of America.
"transporter_2_sign_date" -- Transporter Signature Date 2 - the date upon which the second transporter signed the manifest and received the waste shipment from the first transporter for transport. A blank value means that a value was not entered into the Transporter 2 box on the manifest. (Optional field if there is only one transporter for a shipment.)
FROM
"ny-gov/new-york-hazardous-waste-manifest-data-2006-2018-ys8p-aqus:latest"."new_york_hazardous_waste_manifest_data_2006_2018"
LIMIT 100;
Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query ny-gov/new-york-hazardous-waste-manifest-data-2006-2018-ys8p-aqus
with SQL in under 60 seconds.
Query Your Local Engine
bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"
Read the installation docs.
Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.
It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr
, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgr
can manage the image, container and volume for you.
There are a few ways to ingest data into the local engine.
For external repositories, the Splitgraph Engine can "mount" upstream data sources by using sgr mount
. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.
For hosted datasets (like this repository), where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr clone
and sgr checkout
.
Cloning Data
Because ny-gov/new-york-hazardous-waste-manifest-data-2006-2018-ys8p-aqus:latest
is a Splitgraph Image, you can clone the data from Spltgraph Cloud to your local engine, where you can query it like any other Postgres database, using any of your existing tools.
First, install Splitgraph if you haven't already.
Clone the metadata with sgr clone
This will be quick, and does not download the actual data.
sgr clone ny-gov/new-york-hazardous-waste-manifest-data-2006-2018-ys8p-aqus
Checkout the data
Once you've cloned the data, you need to "checkout" the tag that you want. For example, to checkout the latest
tag:
sgr checkout ny-gov/new-york-hazardous-waste-manifest-data-2006-2018-ys8p-aqus:latest
This will download all the objects for the latest
tag of ny-gov/new-york-hazardous-waste-manifest-data-2006-2018-ys8p-aqus
and load them into the Splitgraph Engine. Depending on your connection speed and the size of the data, you will need to wait for the checkout to complete. Once it's complete, you will be able to query the data like you would any other Postgres database.
Alternatively, use "layered checkout" to avoid downloading all the data
The data in ny-gov/new-york-hazardous-waste-manifest-data-2006-2018-ys8p-aqus:latest
is 0 bytes. If this is too big to download all at once, or perhaps you only need to query a subset of it, you can use a layered checkout.:
sgr checkout --layered ny-gov/new-york-hazardous-waste-manifest-data-2006-2018-ys8p-aqus:latest
This will not download all the data, but it will create a schema comprised of foreign tables, that you can query as you would any other data. Splitgraph will lazily download the required objects as you query the data. In some cases, this might be faster or more efficient than a regular checkout.
Read the layered querying documentation to learn about when and why you might want to use layered queries.
Query the data with your existing tools
Once you've loaded the data into your local Splitgraph Engine, you can query it with any of your existing tools. As far as they're concerned, ny-gov/new-york-hazardous-waste-manifest-data-2006-2018-ys8p-aqus
is just another Postgres schema.