Splitgraph has been acquired by EDB!

ny-gov/metropolitan-transportation-authority-mta-capital-ehz8-ag3n

Query the Data Delivery Network

The easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire protocol. Any Splitgraph user can connect to it at data.splitgraph.com:5432 and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.

For example, you can query the metropolitan_transportation_authority_mta_capital table in this repository, by referencing it like:

"ny-gov/metropolitan-transportation-authority-mta-capital-ehz8-ag3n:latest"."metropolitan_transportation_authority_mta_capital"

or in a full query, like:

SELECT
    ":id", -- Socrata column ID
    "design_completion_year", -- This field shows the Design Completion Date Year for each project.
    "phase", -- This field shows the Current phase for each project. Projects begin in Planning, progress to Design, then Construction and finally Completion.
    "overall_scope_change", -- This field indicates a change in overall project scope. I = Increase / D =  Decrease / blank =  'No Change'
    "design_start_month", -- This field shows the Design Start Date Month for each project.
    "construction_start_year", -- This field shows the Construction Start Date Year for each project.
    "project_number", -- A capital project that invests funds over one or more phases of work (i.e., planning, design, purchase and installation, construction) to deliver a defined benefit (e.g., rehabilitating existing assets, improving or expanding the system). The field is a unique project identification code created by concatenating the AGENCY_CODE, CATEGORY, ELEMENT, and PROJECT.
    "plan_revision_keys", -- These are the Plan Admendment Keys that will be needed to link to the PLAN_REV field in the Capital Dashboard Detail data. The keys are all in one field because the number of keys can vary for each PROj_NUM. (Format: YYYYMM)
    "plan_revision_display_keys", -- These are the Plan Admendment Keys that are displayed on the detail page for each project. These can also be used to link to the BUDGET_SUBMISSION_DATE field in the Capital Dashboard Detail data. 
    "agency_code", -- MTA Operating Agency  (D)Bridges_and_Tunnels  (G)Capital_Construction  (L)Long_Island_Railroad  (M)Metro_North_Railroad  (N)MTA_Interagency  (S)SIRTOA  (T)New_York_City_Transit  (U)MTA_Bus_Company
    "element", -- This field shows the element, a subset of a category.
    "project", -- This field shows a unique code that identifies each project within an element.
    "agency_name", -- This field shows the descriptive MTA operating agency title.
    "element_description", -- This field shows the descriptive category title.
    "project_description", -- This field shows the descriptive project title.
    "needs_code", -- This field shows the category type for each project.  Needs Codes:  • Network Expansion;  • New Initiative;  • Normal Replacement;  • Restoration;  • Resiliency;  • State of Good Repair;  • System Improvement;  • Various;  • Program Support 
    "origional_budget", -- This field shows the original MTA Board (or Capital Program Review Board) approved budget amount for each project.
    "current_budget", -- This field shows the latest available value for the project budget as recorded in the MTA Capital Program Management System.
    "origional_budget_vs_current_budget", -- This calculated field shows the difference between the Original Budget and the Current Budget as a percentage of the Original Budget. A budget decrease exceeding 10% is indicated by green text, a budget increase exceeding 10% is indicated by red text on t
    "origional_start_month", -- This field shows the Original Construction Start Date or the start date of major activity for each project.
    "origional_start_year", -- This field shows the Original Construction Start Date or the start date of major activity for each project.
    "current_start_year", -- This field shows the Current Project Start Date Year for each project.
    "current_start_month", -- This field shows the Current Project Start Date Month for each project.
    "origional_completion_month", -- This field shows the Original Project Completion Date Month for each project.
    "current_completion_month", -- This field shows the Current Project Completion Date Month for each project.
    "current_completion_year", -- This field shows the Current Project Completion Date Year for each project.
    "design_start_year", -- This field shows the Design Start Date Year for each project.
    "construction_start_month", -- This field shows the Construction Start Date Month for each project.
    "construction_completion_year", -- This field shows the Construction Completion Date Year for each project.
    "percentage_complete", -- This field shows the percentage based on the total project budget expended.
    "location_indicator", -- This indicates whether a project has a single or multiple locations, or a system wide project, or related to rolling stock such as buses or train cars, and funding for other projects. Example: •	Base:  A project has one location and must include a Latitude and Longitude value.  •	Multilocation:  A project has multiple locations relatively close to one another. There will be multiple lines for the project, each having a unique set of Latitude and Longitude values.  •	Systemwide: Projects that span large areas such as track work or variety of system projects.  •	Car: Projects that deal with Rolling Stock for railroad or subway cars.  •	Bus: A project that deals with Rolling Stock for buses.   •	Dollar: Projects that are set up to hold funding for other projects. 
    "origional_completion_year", -- This field shows the Original Project Completion Date Year for each project.
    "construction_completion_month", -- This field shows the Construction Completion Date Month for each project.
    "scope_objective", -- This narrative provides a summary level description of the project.
    "category_description", -- This field shows the descriptive category title.
    "design_completion_month", -- This field shows the Design Completion Date Month for each project.
    "current_quarter_flag", -- CQ indicates the Load Date Records that are the most current  quarter data published to the MTA CapitalDashboard
    "category", -- This field shows the capital investment category.
    "capital_plan", -- An investment plan of the MTA Capital Program that links specific funding sources to planned investment projects over a particular period of time, typically 5-years.
    "load_date" -- The Capital Program Dashboard is published on a quarterly basis. The LOADDATE field represents the quarter assoctaed with the data. 
FROM
    "ny-gov/metropolitan-transportation-authority-mta-capital-ehz8-ag3n:latest"."metropolitan_transportation_authority_mta_capital"
LIMIT 100;

Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query ny-gov/metropolitan-transportation-authority-mta-capital-ehz8-ag3n with SQL in under 60 seconds.

Query Your Local Engine

Install Splitgraph Locally

bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"

Read the installation docs.

Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.

It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgrcan manage the image, container and volume for you.

There are a few ways to ingest data into the local engine.

For external repositories, the Splitgraph Engine can "mount" upstream data sources by using sgr mount. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.

For hosted datasets (like this repository), where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr cloneand sgr checkout.

Cloning Data

Because ny-gov/metropolitan-transportation-authority-mta-capital-ehz8-ag3n:latest is a Splitgraph Image, you can clone the data from Spltgraph Cloud to your local engine, where you can query it like any other Postgres database, using any of your existing tools.

First, install Splitgraph if you haven't already.

Clone the metadata with `sgr clone`

This will be quick, and does not download the actual data.

sgr clone ny-gov/metropolitan-transportation-authority-mta-capital-ehz8-ag3n

Checkout the data

Once you've cloned the data, you need to "checkout" the tag that you want. For example, to checkout the latest tag:

sgr checkout ny-gov/metropolitan-transportation-authority-mta-capital-ehz8-ag3n:latest

This will download all the objects for the latest tag of ny-gov/metropolitan-transportation-authority-mta-capital-ehz8-ag3n and load them into the Splitgraph Engine. Depending on your connection speed and the size of the data, you will need to wait for the checkout to complete. Once it's complete, you will be able to query the data like you would any other Postgres database.

Alternatively, use "layered checkout" to avoid downloading all the data

The data in ny-gov/metropolitan-transportation-authority-mta-capital-ehz8-ag3n:latest is 0 bytes. If this is too big to download all at once, or perhaps you only need to query a subset of it, you can use a layered checkout.:

sgr checkout --layered ny-gov/metropolitan-transportation-authority-mta-capital-ehz8-ag3n:latest

This will not download all the data, but it will create a schema comprised of foreign tables, that you can query as you would any other data. Splitgraph will lazily download the required objects as you query the data. In some cases, this might be faster or more efficient than a regular checkout.

Read the layered querying documentation to learn about when and why you might want to use layered queries.

Query the data with your existing tools

Once you've loaded the data into your local Splitgraph Engine, you can query it with any of your existing tools. As far as they're concerned, ny-gov/metropolitan-transportation-authority-mta-capital-ehz8-ag3n is just another Postgres schema.

ny-gov/metropolitan-transportation-authority-mta-capital-ehz8-ag3n

Query the Data Delivery Network

Query Your Local Engine

Cloning Data

Clone the metadata with `sgr clone`

Checkout the data

Alternatively, use "layered checkout" to avoid downloading all the data

Query the data with your existing tools

Related Documentation:

Product

Support

Company

Community

Query the Data Delivery Network

Query Your Local Engine

Cloning Data

Clone the metadata with sgr clone

Checkout the data

Alternatively, use "layered checkout" to avoid downloading all the data

Query the data with your existing tools

Related Documentation:

Product

Support

Company

Community

Clone the metadata with `sgr clone`