Query the Data Delivery Network
Query the DDNThe easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire protocol. Any Splitgraph user can connect to it at data.splitgraph.com:5432
and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.
For example, you can query the biodiversity_by_county_distribution_of_animals
table in this repository, by referencing it like:
"ny-gov/biodiversity-by-county-distribution-of-animals-tk82-7km5:latest"."biodiversity_by_county_distribution_of_animals"
or in a full query, like:
SELECT
":id", -- Socrata column ID
"ny_listing_status", -- For animals and plants, the legal protected status under New York State Environmental Conservation Law (ECL) and under New York State regulations. The highest level of protection is given to species listed by New York State as Endangered or Threatened. Regulations regarding animals are administered by NYS DEC’s Division of Fish, Wildlife, and Marine Resources. Regulations regarding plants are administered by NYS DEC’s Division of Lands and Forests. For Animals, categories of Endangered, Threatened, and Special Concern species are defined in New York State ECL §11-0535. Endangered, Threatened, and Special Concern species are listed in regulation 6NYCRR 182.5, and at http://www.dec.ny.gov/animals/7494.html. For Plants, categories of Endangered, Threatened, Rare, and Exploitably Vulnerable are defined in ECL §9-1503. Plants in these categories are protected against picking, removal, or damaging with herbicides without the consent of the landowner. Endangered, Threatened, Rare, and Exploitably Vulnerable species are listed in regulation 6NYCRR 193.3, and at http://www.dec.ny.gov/regs/15522.html.
"federal_listing_status", -- For animals and plants, the listing status under the U.S. Endangered Species Act, as it applies to populations of the species in New York State. Listing provides legal protection for this species at the federal level. Listing categories are Endangered, Threatened, or Candidate. As defined by the Act, endangered refers to species that are "in danger of extinction within the foreseeable future throughout all or a significant portion of its range," while threatened refers to “those animals and plants likely to become endangered within the foreseeable future throughout all or a significant portion of their ranges.”
"distribution_status", -- Status of the presence of the species or natural community type in the given county, as recorded in the dataset’s source databases. Values are: Recently confirmed = Documented, with confirmed identification, since 1980. Historically confirmed = Last documented, with confirmed identification, before 1980; current presence is unknown, but could still be present. Possible, but not confirmed = Has not been documented but has been confirmed nearby, or has been reported but identification has not been confirmed. Extirpated = Has been documented in the past, but is now believed to no longer occur in the given county.
"scientific_name", -- For plants and animals, the scientific name used in the database of the New York Natural Heritage Program. Names are based on generally accepted references, augmented by recent scientific literature and expert opinion. For natural communities, the names of community types are documented in New York Natural Heritage’s “Ecological Communities of New York State, Second Edition” (draft).
"global_conservation_rank", -- A rank assigned by New York Natural Heritage to each species and community type indicating how imperiled the species or community type is throughout the world. The global conservation rank is based on how rare the species or community type is across its global range, and on population trends and threats. For species, these ranks provide an estimate of extinction risk; while for natural communities, they provide an estimate of the risk of elimination. As new information becomes available, ranks may be revised. The ranks are based on a one to five scale, ranging from G1 = critically imperiled to G5 = demonstrably secure (common and widespread). Natural Heritage conservation status ranks carry no legal or regulatory weight. Basic global conservation ranks are: G1 – Critically Imperiled (very high risk of extinction) G2 – Imperiled (high risk of extinction) G3 – Vulnerable (moderate risk of extinction) G4 – Apparently Secure (uncommon but not rare) G5 – Definitely Secure (common and widespread) GH – Possibly Extinct: not seen anywhere in last 30, but could still exist GX – Extinct: no longer present anywhere in the world GU – Unrankable: Currently unrankable due to lack of information or due to substantially conflicting information about status or trends. GNR – Not Ranked: global conservation status not yet assessed. GNA – Not Applicable, because the species is not a suitable target for conservation activities (e.g., species is a hybrid, or a domesticated species). Variations of these ranks include: • Range ranks, such as G1G2, indicate not enough information is available to distinguish between two single ranks. • ? after a rank, such as G2?, indicates some uncertainty about the true rank, but is most likely the assigned rank. • T ranks, such as T3, indicate the rank applies to a subspecies or variety, but not to the species as a whole.
"year_last_documented", -- The most recent year the species or community type was observed in the given county, as documented in the dataset’s source databases. A value of “2000 – 2005” indicates that the species was most recently documented during the second NYS Breeding Bird Atlas Project, conducted from 2000 to 2005. A value of “1990-1999” indicates that the species was most recently documented during the NY Amphibian and Reptile Atlas Project, conducted from 1990 to 1999. A value of “not available” indicates that the species or community type has been recorded in the given county, but no date is available.
"common_name", -- For plants and animals, the common name is its “plain English” name, as used in the database of the New York Natural Heritage Program. Names are based on generally accepted references, augmented by recent scientific literature and expert opinion. For natural communities, the names of community types are documented in New York Natural Heritage’s “Ecological Communities of New York State, Second Edition” (draft).
"category", -- Category of the species or community: Animal, Plant, or Natural Community.
"county", -- Name of New York State County. In addition to New York’s 62 Counties, the dataset also includes separate entries for offshore open waters that are part of New York State but that are not within the jurisdiction of any county: Lake Ontario, Lake Erie, and Atlantic Ocean/Long Island Sound.
"taxonomic_subgroup", -- For animals and plants, a lower level of taxonomic group than the Taxonomic Group (above). The subgroup is the taxonomic phylum, class, order, or family to which the species belong. Subgroups are not always equivalent to a single taxonomic group, and they are given English names. For natural communities, subgroup is the subsystem to which the natural community belongs. Marine and tidal wetland systems are divided into subtidal and intertidal subsystems. The freshwater nontidal wetlands system is divided into open mineral soil wetlands, forested mineral soil wetlands, open peatlands, and forested peatlands. The uplands system is divided into open uplands, barrens and woodlands, and forested uplands. The rivers and streams, lakes and ponds, and subterranean systems each have one subsystem.
"taxonomic_group", -- For animals and plants, the taxonomic phylum, class, or order to which the species belongs. Groups are not always equivalent to a single taxonomic group, and they are given English names. For natural communities, group is the system to which the natural community belongs. Natural communities are grouped into seven systems: marine, tidal wetlands (estuarine), rivers and streams (riverine), lakes and ponds (lacustrine), freshwater nontidal wetlands (palustrine), uplands (terrestrial), and subterranean (caves).
"state_conservation_rank" -- A rank assigned by New York Natural Heritage to each species and community type indicating how imperiled it is in New York State. The state conservation rank is based on how rare or abundant the species or community type is in New York, its distribution, and on population trends and threats. As new information becomes available, ranks may be revised. The ranks are based on a one to five scale, ranging from S1 = critically imperiled to S5 = demonstrably secure (common and widespread). Natural Heritage conservation status ranks carry no legal or regulatory weight. Basic state conservation ranks are: S1 – Critically Imperiled in New York State S2 – Imperiled in New York State S3 – Vulnerable in New York State S4 – Apparently Secure in New York State S5 – Definitely Secure in New York State (common and widespread) SH – Historical in New York: not seen since before 1980, but could still be present SX—Extirpated: no longer present in New York SU – Unrankable: Currently unrankable due to lack of information or due to substantially conflicting information about status or trends. SNR – Not Ranked: state conservation status not yet assessed. SNA – Not Applicable, because the species is not a suitable target for conservation activities (e.g., species is a hybrid, a domesticated species, not native to New York, an accidental or infrequent visitor outside of its normal range, a transient migrant just passing through the state, or a species with only unconfirmed or doubtful reports). Variations of these ranks include: • Range ranks, such as S1S2, indicate not enough information is available to distinguish between two single ranks. • ? after a rank, such as S2?, indicates some uncertainty about the true rank, but is most likely the assigned rank. • B after a rank, such as S2B, indicates the rank applies to the breeding populations in New York of a migratory animal. • N after a rank, such as S3N, indicates the rank applies to the non-breeding populations in New York of a migratory animal.
FROM
"ny-gov/biodiversity-by-county-distribution-of-animals-tk82-7km5:latest"."biodiversity_by_county_distribution_of_animals"
LIMIT 100;
Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query ny-gov/biodiversity-by-county-distribution-of-animals-tk82-7km5
with SQL in under 60 seconds.
Query Your Local Engine
bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"
Read the installation docs.
Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.
It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr
, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgr
can manage the image, container and volume for you.
There are a few ways to ingest data into the local engine.
For external repositories, the Splitgraph Engine can "mount" upstream data sources by using sgr mount
. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.
For hosted datasets (like this repository), where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr clone
and sgr checkout
.
Cloning Data
Because ny-gov/biodiversity-by-county-distribution-of-animals-tk82-7km5:latest
is a Splitgraph Image, you can clone the data from Spltgraph Cloud to your local engine, where you can query it like any other Postgres database, using any of your existing tools.
First, install Splitgraph if you haven't already.
Clone the metadata with sgr clone
This will be quick, and does not download the actual data.
sgr clone ny-gov/biodiversity-by-county-distribution-of-animals-tk82-7km5
Checkout the data
Once you've cloned the data, you need to "checkout" the tag that you want. For example, to checkout the latest
tag:
sgr checkout ny-gov/biodiversity-by-county-distribution-of-animals-tk82-7km5:latest
This will download all the objects for the latest
tag of ny-gov/biodiversity-by-county-distribution-of-animals-tk82-7km5
and load them into the Splitgraph Engine. Depending on your connection speed and the size of the data, you will need to wait for the checkout to complete. Once it's complete, you will be able to query the data like you would any other Postgres database.
Alternatively, use "layered checkout" to avoid downloading all the data
The data in ny-gov/biodiversity-by-county-distribution-of-animals-tk82-7km5:latest
is 0 bytes. If this is too big to download all at once, or perhaps you only need to query a subset of it, you can use a layered checkout.:
sgr checkout --layered ny-gov/biodiversity-by-county-distribution-of-animals-tk82-7km5:latest
This will not download all the data, but it will create a schema comprised of foreign tables, that you can query as you would any other data. Splitgraph will lazily download the required objects as you query the data. In some cases, this might be faster or more efficient than a regular checkout.
Read the layered querying documentation to learn about when and why you might want to use layered queries.
Query the data with your existing tools
Once you've loaded the data into your local Splitgraph Engine, you can query it with any of your existing tools. As far as they're concerned, ny-gov/biodiversity-by-county-distribution-of-animals-tk82-7km5
is just another Postgres schema.