Splitgraph has been acquired by EDB!

pa-gov/covid19-total-expenditures-current-statewide-89u9-g8cy

Query the Data Delivery Network

The easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire protocol. Any Splitgraph user can connect to it at data.splitgraph.com:5432 and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.

For example, you can query the covid19_total_expenditures_current_statewide table in this repository, by referencing it like:

"pa-gov/covid19-total-expenditures-current-statewide-89u9-g8cy:latest"."covid19_total_expenditures_current_statewide"

or in a full query, like:

SELECT
    ":id", -- Socrata column ID
    "commitment_item_description", -- Provides a textual description for each respective technical Commitment Item assignment.
    "commitment_item", -- Represents budget and fund accounting classification of general ledger accounts and cost elements in the SAP funds management (FM) module. It is used to classify the type of expenditures being budgeted and to detail the balances for each fund in the FM module. In this data set, commitment items provide a standard, more detailed classification of expenditures from a fund. Individual commitment items are represented by a seven-digit number. 
    "fund_description", -- Provides a textual description for each respective technical Fund assignment.
    "business_area", -- A department/agency of the Commonwealth of PA. In this dataset, business area represents the department/agency spending the funds. 
    "actual_expends_in_1_000", -- Dollars authorized by a department/agency to be spent. Figures in this data set are reported in thousands of dollars. Expenditures listed are amount spent to date since the beginning of the COVID-19 pandemic in both fiscal years 2019-20 and 2020-21. 
    "fund", -- An independent fiscal and accounting entity comprising a source of money set aside by law for the purpose of carrying on specific activities in accordance with special regulations, restrictions, or limitations. Typically, the General Assembly appropriates an authorized spending limits for each fund each year. Executive authorizations, where the General Assembly previously provided blanket authorization, may also set spending limits in certain funds.  Individual funds are identified by a unique ten-digit number.   Fund numbers beginning with “87” are provided by the federal government for COVID relief. Funds 5000400890 and 5000500890-5000500893 are enhanced COVID unemployment benefits provided by the federal government. Other funds listed are state matching dollars for federal COVID relief-related grants. Most federal COVID relief programs required no state match.
    "functional_area", -- Reflects one of the eight major Commonwealth programs. Programs include:  •	Direction and Supportive Services •	Protection of Persons and Property •	Education •	Health and Human Services •	Economic Development •	Transportation •	Recreation and Cultural Enrichment •	Debt Service
    "funding_source", -- State or Federal Funding
    "funds_started_march_2020" -- Funds accumulating since March 2020 until the date listed. 
FROM
    "pa-gov/covid19-total-expenditures-current-statewide-89u9-g8cy:latest"."covid19_total_expenditures_current_statewide"
LIMIT 100;

Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query pa-gov/covid19-total-expenditures-current-statewide-89u9-g8cy with SQL in under 60 seconds.

Query Your Local Engine

Install Splitgraph Locally

bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"

Read the installation docs.

Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.

It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgrcan manage the image, container and volume for you.

There are a few ways to ingest data into the local engine.

For external repositories, the Splitgraph Engine can "mount" upstream data sources by using sgr mount. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.

For hosted datasets (like this repository), where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr cloneand sgr checkout.

Cloning Data

Because pa-gov/covid19-total-expenditures-current-statewide-89u9-g8cy:latest is a Splitgraph Image, you can clone the data from Spltgraph Cloud to your local engine, where you can query it like any other Postgres database, using any of your existing tools.

First, install Splitgraph if you haven't already.

Clone the metadata with `sgr clone`

This will be quick, and does not download the actual data.

sgr clone pa-gov/covid19-total-expenditures-current-statewide-89u9-g8cy

Checkout the data

Once you've cloned the data, you need to "checkout" the tag that you want. For example, to checkout the latest tag:

sgr checkout pa-gov/covid19-total-expenditures-current-statewide-89u9-g8cy:latest

This will download all the objects for the latest tag of pa-gov/covid19-total-expenditures-current-statewide-89u9-g8cy and load them into the Splitgraph Engine. Depending on your connection speed and the size of the data, you will need to wait for the checkout to complete. Once it's complete, you will be able to query the data like you would any other Postgres database.

Alternatively, use "layered checkout" to avoid downloading all the data

The data in pa-gov/covid19-total-expenditures-current-statewide-89u9-g8cy:latest is 0 bytes. If this is too big to download all at once, or perhaps you only need to query a subset of it, you can use a layered checkout.:

sgr checkout --layered pa-gov/covid19-total-expenditures-current-statewide-89u9-g8cy:latest

This will not download all the data, but it will create a schema comprised of foreign tables, that you can query as you would any other data. Splitgraph will lazily download the required objects as you query the data. In some cases, this might be faster or more efficient than a regular checkout.

Read the layered querying documentation to learn about when and why you might want to use layered queries.

Query the data with your existing tools

Once you've loaded the data into your local Splitgraph Engine, you can query it with any of your existing tools. As far as they're concerned, pa-gov/covid19-total-expenditures-current-statewide-89u9-g8cy is just another Postgres schema.

pa-gov/covid19-total-expenditures-current-statewide-89u9-g8cy

Query the Data Delivery Network

Query Your Local Engine

Cloning Data

Clone the metadata with `sgr clone`

Checkout the data

Alternatively, use "layered checkout" to avoid downloading all the data

Query the data with your existing tools

Related Documentation:

Product

Support

Company

Community

Query the Data Delivery Network

Query Your Local Engine

Cloning Data

Clone the metadata with sgr clone

Checkout the data

Alternatively, use "layered checkout" to avoid downloading all the data

Query the data with your existing tools

Related Documentation:

Product

Support

Company

Community

Clone the metadata with `sgr clone`