pa-gov/emissions-inventory-system-eis-facilities-2017-e7ip-7qrs
Icon for Socrata external plugin

Query the Data Delivery Network

Query the DDN

The easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire protocol. Any Splitgraph user can connect to it at data.splitgraph.com:5432 and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.

For example, you can query the emissions_inventory_system_eis_facilities_2017 table in this repository, by referencing it like:

"pa-gov/emissions-inventory-system-eis-facilities-2017-e7ip-7qrs:latest"."emissions_inventory_system_eis_facilities_2017"

or in a full query, like:

SELECT
    ":id", -- Socrata column ID
    "rpi_identifier", -- Identifiers by which the emissions release point is known or has been known.
    "rel_pt_iden_id", -- Primary Key used to help group columns.
    "affl_locationaddresstext", -- The physical location of a facility site or organization.
    "affl_org_addr_id", -- A unique identifier for each affiliation address consisting of the 5 address columns.
    "organizationformalname", -- Name of the organization. 
    "datacollectiondate", -- This is the date that the geographic location data (the lat/longs) was obtained.
    "effectivedate", -- Date that the Air Quality primary facility was created in eFACTS. 
    "geocoded_column", -- Georeferenced column for use in creating mapping visualizations with both latitude and longitude for the facility site. 
    "rp_fugitiveheightmeasure", -- The fugitive release height unit of measure. Values as as follows: FT = Feet; Null = no measurement
    "rp_exitgasflowratemeasure", -- The unit of measure for the stack gas flow rate value.  Measurements in ACFS (Actual Cubic Feet per Second)
    "rp_stackheightunitofmeascd", -- The stack height unit of measure. Values as as follows: FT = Feet; Null = no measurement
    "affl_effectivedate", -- Date that the eFACTS client record was created. 
    "verticalcollectionmethodcode", -- The code that identifies the method used to collect the vertical measure (i.e., the altitude) of a reference point. 001 = A method used to determine vertical components based on altimetry 002 = Geographic coordinate determination based on classical surveying techniques 003 = Geographic coordinate determination method based on GPS carrier phase kinematic relative positioning technique 004 = Geographic coordinate determination method based on GPS carrier phase static positioning technique 005 = Geographic coordinate determination method based on GPS code measurements (pseudo range) differential (DGPS) 006 = Geographic coordinate determination method based on GPS code measurements (pseudo range) precise positioning service 007 = Geographic coordinate determination method based on GPS code measurements (pseudo range) standard positioning service (SA off) 008 = Geographic coordinate determination method based on GPS code measurements (pseudo range) standard positioning service (SA on) 009 = Leveling between non-benchmark control points 010 = Unknown 011 = Photogrammetric 012 = Precise leveling from a bench mark 013 = Topographic map interpolation 014 = Trigonometric leveling.
    "locationcountycode", -- State's county code. For PA this is a number from 1 to 67.
    "naicsindustry", -- The facility's industry description. NAICS is a mnuemonic for North American Industry Classification System. (over 1000 codes; codes are in NAICSCODE and descriptions are in NAICSINDUSTRY)
    "rp_comment", -- Any comments regarding the release point.
    "rp_exitgastemperaturemeasure", -- The temperature of an exit gas stream (measured in degrees Fahrenheit).
    "sourcemapscalenumber", -- The number that represents the proportional distance on the ground for one unit of measure on the map or photo. Any number or blank if unknown.
    "locationcountyname", -- Pennsylvania County name.
    "rpi_effectivedate", -- Date the emissions release point or stack was installed at a facility.
    "rp_exitgasvelocityunitofmeascd", -- The unit of measure for the velocity of an exit gas stream value. Values as as follows: FPS = Feet per second Null = no measurement
    "rp_description", -- Additional text description of the release point.
    "rp_id", -- Primary Key used to help group Release Point columns.
    "affl_localityname", -- The name of the city, town, village, or other locality.
    "affl_supplementallocationtext", -- The text that provides additional information about a place, including a building name with its secondary unit and number, an industrial park name, an installation name, or descriptive text where no formal address is available.
    "affl_org_iden", -- The affiliated organization which directs, is responsible for, or has authority over the activities and operations of the facility site.  Derived from the client id from eFACTS.
    "affiliationtype", -- Identifies the function that an organization or individual serves, or the relationship between an individual or organization and the facility site. Values are operator, owner, or unknown.
    "verticalcollectionmethod", -- Identifies the method used to collect the vertical measure (i.e., the altitude) of a reference point. See Vertical Collection Method Code column to view all possible values.
    "geographicreferencepoint", -- Represents the place for which geographic coordinates were established. See Geographic Reference Point Code column to view all possible values.
    "horizontalcollectionmethodcode", -- The code that identifies the method used to determine the latitude and longitude coordinates for a point on the earth.
    "horizontalaccuracymeasure", -- The horizontal measure, in meters, of the relative accuracy of the latitude and longitude coordinates.
    "supplementallocationtext", -- The text that provides additional information about a place, including a building name with its secondary unit and number, an industrial park name, an installation name, or descriptive text where no formal address is available. 
    "locationaddresstext", -- The physical location of a facility site or organization. 
    "latitudemeasure", -- The measure of the angular distance on a meridian north or south of the equator for the Facility site.
    "localityname", -- The name of the city, town, village, or other locality.
    "stateandcountyfipscode", -- The list is from FIPS Counties codes used for the identification of the Counties and County equivalents of the United States. The FIPS county code is a five-digit Federal Information Processing Standard (FIPS) code which uniquely identifies counties and county equivalents in the United States, certain U.S. possessions, and certain freely associated states. The first 2-digits are the State code and the last 3-digits are the county codes. 42001 Adams 42003 Allegheny 42005 Armstrong 42007 Beaver 42009 Bedford 42011 Berks 42013 Blair 42015 Bradford 42017 Bucks 42019 Butler 42021 Cambria 42023 Cameron 42025 Carbon 42027 Centre 42029 Chester 42031 Clarion 42033 Clearfield 42035 Clinton 42037 Columbia 42039 Crawford 42041 Cumberland 42043 Dauphin 42045 Delaware 42047 Elk 42049 Erie 42051 Fayette 42053 Forest 42055 Franklin 42057 Fulton 42059 Greene 42061 Huntingdon 42063 Indiana 42065 Jefferson 42067 Juniata 42069 Lackawanna 42071 Lancaster 42073 Lawrence 42075 Lebanon 42077 Lehigh 42079 Luzerne 42081 Lycoming 42083 McKean 42085 Mercer 42087 Mifflin 42089 Monroe 42091 Montgomery 42093 Montour 42095 Northampton 42097 Northumberland 42099 Perry 42101 Philadelphia 42103 Pike 42105 Potter 42107 Schuylkill 42109 Snyder 42111 Somerset 42113 Sullivan 42115 Susquehanna 42117 Tioga 42119 Union 42121 Venango 42123 Warren 42125 Washington 42127 Wayne 42129 Westmoreland 42131 Wyoming 42133 York
    "facilitysiteidentifier", -- A state's designated identifier by which the facility site is referred to by a system. The name of the facility is in the Facility Site Name column.
    "naicscode", -- The code that represents a subdivision of an industry that accommodates user needs in the United States. NAICS is a mnuemonic for North American Industry Classification System. (over 1000 codes; codes are in NAICS Code and descriptions are in NAICS Industry)
    "facilitysitename", -- The name assigned to the facility site by the reporter, i.e., name for the Facility Site Identifier.
    "facilitycategorycode", -- Code that identifies the Clean Air Act Stationary Source designation. CAP = (CAP MAJOR): Facility is Major based upon 40 CFR 70 Major Source definition paragraph 2 (100 tpy any CAP.  Also meets paragraph 3 definition, but NOT paragraph 1 definition).  HAP = (HAP MAJOR): Facility is Major based upon only 40 CFR 70 Major Source definition paragraph 1 (10/25 tpy HAPs).  HAPCAP =(HAP AND CAP MAJOR): Facility meets both paragraph 1 and 2 of 40 CFR 70 Major Source definitions (10/25 tpy HAPs and 100 tpy any CAP).  HAPOZN = (HAP AND O3 N/A MAJOR): Facility meets both paragraph 1 and 3 of 40 CFR 70 Major Source definitions (10/25 tpy HAPs and Ozone n/a area lesser tons for NOx or VOC).  NON = (NON-MAJOR): Facility's Potential To Emit is below all 40 CFR 70 Major Source threshold definitions without a FESOP.  OZN = (O3 N/A MAJOR): Facility is Major based upon only 40 CFR 70 Major Source definition paragraph 3 (Ozone n/a area lesser tons for NOx or VOC).  SYN = (SYNTHETIC NON-MAJOR): Facility has a FESOP which limits its Potential To Emit below all three 40 CFR 70 Major Source definitions.  UNK = (UNKNOWN): Facility category per 40 CFR 70 Major Source definitions is unknown.
    "facilitysiteid", -- Identifiers by which the facility site is known or has been known, and the system associated with the identifier.
    "facilitycategory", -- Identifies the Clean Air Act Stationary Source designation. (text provided in Facility Category Code field)
    "affl_locationaddresspostalcode", -- The code that represents a U.S. ZIP code or International postal code.
    "horizontalcollectionmethod", -- Identifies the method used to determine the latitude and longitude coordinates for a point on the earth. See code column above to view all possible values.
    "locationaddresspostalcode", -- The code that represents a U.S. ZIP code or International postal code. 5 or 9 digit postal codes.
    "facilitysitestatus", -- Code that identifies the operating status of the facility site. Values are OP - Operating; PS - Permanently Shutdown; TS - Temporarily Shutdown. 
    "emissionsyear", -- Year of the emission 
    "rp_stackheightmeasure", -- The height of the stack from the ground.
    "rp_type", -- Identifies the type of release point. 1 = fugitive emissions 2 = vertical emissions, unobstructed opening 3 = horizontal or nearly horizontal emissions 5 = vertical with weather cap or similar obstruction 6 = downward or nearly downward emissions.
    "rp_typecode", -- Code that identifies the type of release point. Values are as follows and described in the Release point type and description columns: 1 = fugitive emissions 2 = vertical emissions, unobstructed opening 3 = horizontal or nearly horizontal emissions 5 = vertical with weather cap or similar obstruction 6 = downward or nearly downward emissions
    "rp_fugitiveheightunitofmeascd", -- The fugitive release height unit of measure. Values as as follows: FT = Feet; Null = no measurement
    "rp_exitgasvelocitymeasure", -- The velocity of an exit gas stream.
    "rp_stackdiametermeasure", -- The internal diameter of the stack at the release height.
    "locationaddressstatecode", -- The alphabetic codes that represent the name of the principal administrative subdivision of the United States, Canada, or Mexico. Standard 2 char state codes
    "geographicreferencepointcode", -- The code that represents the place for which geographic coordinates were established. 022 = depricated 101 = Any point on a Facility/System or associated with the substance or monitor 102 = The center of a Facility/System defined as the Geographic Centroid of the Facility 106 = Point where substance is released (point where substance is directly released into the environment) 108 = Points not represented by general codes 101-107, 109 or their specific codes-Unknown.
    "horizontalreferencedatum", -- Represents the reference datum used in determining latitude and longitude coordinates. See Horizontal Reference Datum Code column to view all possible values.
    "horizontalreferencedatumcode", -- The code that represents the reference datum used in determining latitude and longitude coordinates. 1  = North American Datum of 1927 2  = North American Datum of 1983 3  = World Geodetic System of 1984 (Previously GEO84) 21 = Unknown
    "longitudemeasure", -- The measure of the angular distance on a meridian east or west of the prime meridian for the Facility site.
    "rp_stackdiameterunitofmeascd", -- The stack diameter unit of measure. Values as as follows: FT = Feet; Null = no measurement
    "affl_locationaddressstatecode", -- The alphabetic codes that represent the name of the principal administrative subdivision of the United States, Canada, or Mexico.
    ":@computed_region_rayf_jjgk",
    ":@computed_region_r6rf_p9et",
    ":@computed_region_amqz_jbr4",
    ":@computed_region_d3gw_znnf",
    ":@computed_region_nmsq_hqvv"
FROM
    "pa-gov/emissions-inventory-system-eis-facilities-2017-e7ip-7qrs:latest"."emissions_inventory_system_eis_facilities_2017"
LIMIT 100;

Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query pa-gov/emissions-inventory-system-eis-facilities-2017-e7ip-7qrs with SQL in under 60 seconds.

This repository is an "external" repository. That means it's hosted elsewhere, in this case at data.pa.gov. When you querypa-gov/emissions-inventory-system-eis-facilities-2017-e7ip-7qrs:latest on the DDN, we "mount" the repository using the socrata mount handler. The mount handler proxies your SQL query to the upstream data source, translating it from SQL to the relevant language (in this case SoQL).

We also cache query responses on the DDN, but we run the DDN on multiple nodes so a CACHE_HIT is only guaranteed for subsequent queries that land on the same node.

Query Your Local Engine

Install Splitgraph Locally
bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"
 

Read the installation docs.

Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.

It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgrcan manage the image, container and volume for you.

There are a few ways to ingest data into the local engine.

For external repositories (like this repository), the Splitgraph Engine can "mount" upstream data sources by using sgr mount. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.

For hosted datasets, where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr cloneand sgr checkout.

Mounting Data

This repository is an external repository. It's not hosted by Splitgraph. It is hosted by data.pa.gov, and Splitgraph indexes it. This means it is not an actual Splitgraph image, so you cannot use sgr clone to get the data. Instead, you can use the socrata adapter with the sgr mount command. Then, if you want, you can import the data and turn it into a Splitgraph image that others can clone.

First, install Splitgraph if you haven't already.

Mount the table with sgr mount

sgr mount socrata \
  "pa-gov/emissions-inventory-system-eis-facilities-2017-e7ip-7qrs" \
  --handler-options '{
    "domain": "data.pa.gov",
    "tables": {
        "emissions_inventory_system_eis_facilities_2017": "e7ip-7qrs"
    }
}'

That's it! Now you can query the data in the mounted table like any other Postgres table.

Query the data with your existing tools

Once you've loaded the data into your local Splitgraph engine, you can query it with any of your existing tools. As far as they're concerned, pa-gov/emissions-inventory-system-eis-facilities-2017-e7ip-7qrs is just another Postgres schema.