pa-gov/covid19-total-expenditures-current-statewide-89u9-g8cy
Icon for Socrata external plugin

Query the Data Delivery Network

Query the DDN

The easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire protocol. Any Splitgraph user can connect to it at data.splitgraph.com:5432 and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.

For example, you can query the covid19_total_expenditures_current_statewide table in this repository, by referencing it like:

"pa-gov/covid19-total-expenditures-current-statewide-89u9-g8cy:latest"."covid19_total_expenditures_current_statewide"

or in a full query, like:

SELECT
    ":id", -- Socrata column ID
    "commitment_item_description", -- Provides a textual description for each respective technical Commitment Item assignment.
    "commitment_item", -- Represents budget and fund accounting classification of general ledger accounts and cost elements in the SAP funds management (FM) module. It is used to classify the type of expenditures being budgeted and to detail the balances for each fund in the FM module. In this data set, commitment items provide a standard, more detailed classification of expenditures from a fund. Individual commitment items are represented by a seven-digit number. 
    "fund_description", -- Provides a textual description for each respective technical Fund assignment.
    "business_area", -- A department/agency of the Commonwealth of PA. In this dataset, business area represents the department/agency spending the funds. 
    "actual_expends_in_1_000", -- Dollars authorized by a department/agency to be spent. Figures in this data set are reported in thousands of dollars. Expenditures listed are amount spent to date since the beginning of the COVID-19 pandemic in both fiscal years 2019-20 and 2020-21. 
    "fund", -- An independent fiscal and accounting entity comprising a source of money set aside by law for the purpose of carrying on specific activities in accordance with special regulations, restrictions, or limitations. Typically, the General Assembly appropriates an authorized spending limits for each fund each year. Executive authorizations, where the General Assembly previously provided blanket authorization, may also set spending limits in certain funds.  Individual funds are identified by a unique ten-digit number.   Fund numbers beginning with “87” are provided by the federal government for COVID relief. Funds 5000400890 and 5000500890-5000500893 are enhanced COVID unemployment benefits provided by the federal government. Other funds listed are state matching dollars for federal COVID relief-related grants. Most federal COVID relief programs required no state match.
    "functional_area", -- Reflects one of the eight major Commonwealth programs. Programs include:  •	Direction and Supportive Services •	Protection of Persons and Property •	Education •	Health and Human Services •	Economic Development •	Transportation •	Recreation and Cultural Enrichment •	Debt Service
    "funding_source", -- State or Federal Funding
    "funds_started_march_2020" -- Funds accumulating since March 2020 until the date listed. 
FROM
    "pa-gov/covid19-total-expenditures-current-statewide-89u9-g8cy:latest"."covid19_total_expenditures_current_statewide"
LIMIT 100;

Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query pa-gov/covid19-total-expenditures-current-statewide-89u9-g8cy with SQL in under 60 seconds.

This repository is an "external" repository. That means it's hosted elsewhere, in this case at data.pa.gov. When you querypa-gov/covid19-total-expenditures-current-statewide-89u9-g8cy:latest on the DDN, we "mount" the repository using the socrata mount handler. The mount handler proxies your SQL query to the upstream data source, translating it from SQL to the relevant language (in this case SoQL).

We also cache query responses on the DDN, but we run the DDN on multiple nodes so a CACHE_HIT is only guaranteed for subsequent queries that land on the same node.

Query Your Local Engine

Install Splitgraph Locally
bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"
 

Read the installation docs.

Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.

It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgrcan manage the image, container and volume for you.

There are a few ways to ingest data into the local engine.

For external repositories (like this repository), the Splitgraph Engine can "mount" upstream data sources by using sgr mount. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.

For hosted datasets, where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr cloneand sgr checkout.

Mounting Data

This repository is an external repository. It's not hosted by Splitgraph. It is hosted by data.pa.gov, and Splitgraph indexes it. This means it is not an actual Splitgraph image, so you cannot use sgr clone to get the data. Instead, you can use the socrata adapter with the sgr mount command. Then, if you want, you can import the data and turn it into a Splitgraph image that others can clone.

First, install Splitgraph if you haven't already.

Mount the table with sgr mount

sgr mount socrata \
  "pa-gov/covid19-total-expenditures-current-statewide-89u9-g8cy" \
  --handler-options '{
    "domain": "data.pa.gov",
    "tables": {
        "covid19_total_expenditures_current_statewide": "89u9-g8cy"
    }
}'

That's it! Now you can query the data in the mounted table like any other Postgres table.

Query the data with your existing tools

Once you've loaded the data into your local Splitgraph engine, you can query it with any of your existing tools. As far as they're concerned, pa-gov/covid19-total-expenditures-current-statewide-89u9-g8cy is just another Postgres schema.