Splitgraph has been acquired by EDB!

ny-gov/hvac-market-share-by-efficiency-and-capacity-tf22-v9nz

Query the Data Delivery Network

The easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire protocol. Any Splitgraph user can connect to it at data.splitgraph.com:5432 and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.

For example, you can query the hvac_market_share_by_efficiency_and_capacity table in this repository, by referencing it like:

"ny-gov/hvac-market-share-by-efficiency-and-capacity-tf22-v9nz:latest"."hvac_market_share_by_efficiency_and_capacity"

or in a full query, like:

SELECT
    ":id", -- Socrata column ID
    "heating_btu_hr", -- Nominal heating capacity for equipment in Btu per hour. For furnaces and boilers, this is the manufacturer specified input capacity. Blank cells indicate no data available
    "seer", -- Nominal Seasonal Energy Efficiency Ratio (SEER) of the outdoor condensing equipment. SEER is the ratio of the total heat removed from the conditioned space during the annual cooling season divided by the total electrical energy consumed by the air conditioner during the same season. The higher the SEER rating, the less electrical energy the air conditioner uses to cool the home.   SEER values a reported by whole number increment and generally range from 13 to 26. For any values that exceed 26, they are binned into the 26 increment for reporting purpose.  Blank cells indicate no data available
    "eer", -- Nominal Energy Efficiency Ratio (EER). EER is the ratio of the cooling capacity in Btu’s per hour to the power input in watts. The higher the EER, the more efficient a piece of equipment is. For split systems, the “lowest matched pair” value is reported to the nearest whole number increment.    Blank cells indicate no data available
    "year", -- The time period (year) covered by the dataset
    "finalestimate", -- Estimated market sales of the record.  Zero values indicate there are no estimates sales of that unit with associated spec information – generally due to purchases and returns being equal over the reported time period. Negative values indicate that more units were returned than sold for a given product category. 
    "hspf", -- Heating Season Performance Factor (HSPF). For split systems, the lowest matched pair value is reported. HSPF represents the total heating output of a heat pump (including supplementary electric heat) during the normal heating season (in Btu) as compared to the total electricity consumed (in watt-hours) during the same period.  HSPF values in the dataset are reported as ranges. Generally, commercially available ASHPs do not exceed a value of 12.5 HSPF.   Blank cells indicate no data available
    "region", -- Name of U.S. State or HARDI Region where sale of equipment occurred; either New York State or Northeastern. For the sake of this data, the Northeastern region is defined as including Maine, New Hampshire, Vermont, Massachusetts, Connecticut, and Rhode Island; and the New York region includes all of New York State.
    "type", -- Type of HVAC equipment sold; Air Conditioner (AC), Air-Source Heat Pump (ASHP), Boiler, or Furnace
    "refrigerant", -- Type of refrigerant used in equipment.  Two options for refrigerant are included in the data set – R22 and R410a.  R22 is represented by a 22 and R410a is represented by 410  Blank cells indicate no data available
    "fueltype", -- Fuel type used by equipment (boilers and furnaces only); either Natural Gas or Oil. Blank cells indicate no data available or the field is irrelevant for the product category (ASHP or AC). 
    "units", -- Number of units sold. If the value is negative, that indicates that more units were returned than sold. 
    "afue", -- Annualized Fuel Utilization Efficiency (AFUE) of the equipment, which is a measurement of the ratio of the annual heat output of the equipment compared to the total annual fossil fuel energy consumed by the equipment. AFUE for the type of furnace tracked through this program have a minimum federal efficiency of 78 AFUE, however, all sold were at least 80 AFUE. AFUE does not exceed a nominal value of 98 in the reported data since furnaces are fossil fuel only, and exclude electric.  Blank cells indicate no data available
    "ducted", -- Type of ducting used with equipment. 1 indicates ducted and 0 indicates ductless. Blank cells indicate no data available
    "motortype", -- Type of blower motor installed in equipment; either ECM or PSC. Types include:  •	Electronically communicated motors (ECM) •	Permanent split capacitor (PSC)  ECM motors are more efficient than PSC motors because they can adjust airflow levels and the speed of the motor based on the temperature in the home. PSCs run at a constant speed regardless of the temperature of the home.    Blank cells indicate no data available
    "cooling_btu_hr" -- Nominal cooling capacity for equipment in Btu per hour. For air conditioners, ground source heat pumps, and air source heat pumps, this is the capacity of the equipment rounded to the closest half-ton increment. Blank cells indicate no data available
FROM
    "ny-gov/hvac-market-share-by-efficiency-and-capacity-tf22-v9nz:latest"."hvac_market_share_by_efficiency_and_capacity"
LIMIT 100;

Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query ny-gov/hvac-market-share-by-efficiency-and-capacity-tf22-v9nz with SQL in under 60 seconds.

Query Your Local Engine

Install Splitgraph Locally

bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"

Read the installation docs.

Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.

It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgrcan manage the image, container and volume for you.

There are a few ways to ingest data into the local engine.

For external repositories, the Splitgraph Engine can "mount" upstream data sources by using sgr mount. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.

For hosted datasets (like this repository), where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr cloneand sgr checkout.

Cloning Data

Because ny-gov/hvac-market-share-by-efficiency-and-capacity-tf22-v9nz:latest is a Splitgraph Image, you can clone the data from Spltgraph Cloud to your local engine, where you can query it like any other Postgres database, using any of your existing tools.

First, install Splitgraph if you haven't already.

Clone the metadata with `sgr clone`

This will be quick, and does not download the actual data.

sgr clone ny-gov/hvac-market-share-by-efficiency-and-capacity-tf22-v9nz

Checkout the data

Once you've cloned the data, you need to "checkout" the tag that you want. For example, to checkout the latest tag:

sgr checkout ny-gov/hvac-market-share-by-efficiency-and-capacity-tf22-v9nz:latest

This will download all the objects for the latest tag of ny-gov/hvac-market-share-by-efficiency-and-capacity-tf22-v9nz and load them into the Splitgraph Engine. Depending on your connection speed and the size of the data, you will need to wait for the checkout to complete. Once it's complete, you will be able to query the data like you would any other Postgres database.

Alternatively, use "layered checkout" to avoid downloading all the data

The data in ny-gov/hvac-market-share-by-efficiency-and-capacity-tf22-v9nz:latest is 0 bytes. If this is too big to download all at once, or perhaps you only need to query a subset of it, you can use a layered checkout.:

sgr checkout --layered ny-gov/hvac-market-share-by-efficiency-and-capacity-tf22-v9nz:latest

This will not download all the data, but it will create a schema comprised of foreign tables, that you can query as you would any other data. Splitgraph will lazily download the required objects as you query the data. In some cases, this might be faster or more efficient than a regular checkout.

Read the layered querying documentation to learn about when and why you might want to use layered queries.

Query the data with your existing tools

Once you've loaded the data into your local Splitgraph Engine, you can query it with any of your existing tools. As far as they're concerned, ny-gov/hvac-market-share-by-efficiency-and-capacity-tf22-v9nz is just another Postgres schema.

ny-gov/hvac-market-share-by-efficiency-and-capacity-tf22-v9nz

Query the Data Delivery Network

Query Your Local Engine

Cloning Data

Clone the metadata with `sgr clone`

Checkout the data

Alternatively, use "layered checkout" to avoid downloading all the data

Query the data with your existing tools

Related Documentation:

Product

Support

Company

Community

Query the Data Delivery Network

Query Your Local Engine

Cloning Data

Clone the metadata with sgr clone

Checkout the data

Alternatively, use "layered checkout" to avoid downloading all the data

Query the data with your existing tools

Related Documentation:

Product

Support

Company

Community

Clone the metadata with `sgr clone`