pa-gov/emissions-inventory-system-eis-emissions-2017-p7sf-rbb5

Query the Data Delivery Network

Query the DDN

The easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire procotol. Any Splitgraph user can connect to it at data.splitgraph.com:5432 and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.

For example, you can query the emissions_inventory_system_eis_emissions_2017 table in this repository, by referencing it like:

"pa-gov/emissions-inventory-system-eis-emissions-2017-p7sf-rbb5:latest"."emissions_inventory_system_eis_emissions_2017"

or in a full query, like:

SELECT 
    ":id", -- Socrata column ID
    "unitoperationdate", -- The date in which the unit commenced operational activities
    ":@computed_region_d3gw_znnf",
    "percentspringactivity", -- The percentage of the annual activity that occurred during the Spring months (March, April, May).
    "unittype", -- Identifies the type of emissions unit activity. 100 Boiler. 270  Incinerator. 180 Process Heater 200 Furnace 220 Calciner 280 Flare 350 Process Equipment Fugitive Leaks 400 Storage Tank 430 Degreaser 450 Spray Booth or Coating Line 450 Spray Booth or Coating Line 470 Printing Line 470 Printing Line 470 Printing Line 480 Gasoline Loading Rack or Arm 600 Chemical Reactor 610 Oxidation Unit 620 Distillation Columns/Stripper 640 Mixer 680 Cooling Tower 720 Crusher 730 Grinder 740 Screen 760 Conveyor 770 Transfer Point 780 Silo 785 Open Storage Pile 255 Dryer
    "parameterunitofmeasure", -- Code and description for the unit of measure for calculation parameter value.
    "emissionfactor", -- The emission factor used for the emissions value if a calculated value was provided  (in TONS). The EMISSION_FACTOR = ROUND(source emission amount/throughput amount).
    "locationaddressstatecode", -- The alphabetic codes that represent the name of the principal administrative subdivision of the United States, Canada, or Mexico. Standard 2-digit state code.
    "emis_unit_proc_rpt_prd_emis", -- An identifier identifying the group of the previous 5 columns.
    "unitcomment", -- Any comments regarding the emissions unit activity.
    "pollutant", -- The pollutant which is controlled by the control measure. The number prefixes are part of the chemical name describing the isomer ( ions or molecules with identical formulas but distinct structures ) part of the name.  For example - 1,2,3,4,5,6 refer to the positions on the cyclohexane molecule where a chlorine atom is substituted for a hydrogen.  The carbon atoms on the cyclohexane ring can be numbered 1 through 6.  Due to the symmetry of this particular molecule, it doesn’t really matter where we start counting on the ring.  What does matter is that each number is only mentioned once, which means there is only 1 chlorine atom attached to each carbon.  One can envision other forms of Hexachlorocyclohexane in which multiple chlorines are attached to a single carbon atom.  For example: 1,1,2,2,3,3 Hexachlorocyclohexane would have two chlorine atoms attached to each of 3 sequential carbon atoms, with none on the other three…  Now 1,2,3,4,5,6 Hexachlorocyclohexane can refer to a mixture of isomers of that molecule.  This is because the ring of cyclohexane’s carbon atoms locks the attached hydrogens and chlorines onto one side of the ring, and this naming convention is not specific enough to tell which chlorine atoms are locked “above” the ring and which are “below it”.  So in order to know which specific isomer we are talked about, a more specific name is required.  This is where the other numbers can help.  “58-89-9” is the  Chemical Abstracts Service (registration) number, sometimes abbreviated CAS # or CASRN.  This is really not part of the name of the molecule, but rather a unique serial number assigned to the molecule.  The number itself is pretty much arbitrarily assigned by the American Chemical Society, but it is important in that it is unique, like a bar code for the molecule.  The number is typically written in 3 parts separated by dashes, with two to seven digits in the first part, two in the second, and one in the third.  If we google CAS#58-89-9  we find out that we are working with the “gamma” isomer of 1,2,3,4,5,6 Hexachlorocyclohexane where the chlorines are arranged around the ring in an up-up-down-up-up-down configuration.  This is also known as “Lindane” a chemical which has been used as an insecticide and a pharmaceutical.
    "unitdesigncapacity", -- The measure of the size of the unit based on the maximum continuous throughput capacity of the unit. Measured in units of  “E6BTU/HR” (million Btu/hr).
    ":@computed_region_nmsq_hqvv",
    "sp_cl_parametervalue", -- The value of the parameter.
    "locationcountyname", -- Pennsylvania County Name
    "localityname", -- The name of the city, town, village, or other locality.
    "geocoded_column", -- Georeferenced point column with Latitude and Longitude for use in creating map visuals. A generic point is given for each county to aid in the creation of placing metrics on map by county. 
    "proc_approachcomment", -- Comments regarding the control approach.
    "totalemissions", -- Total calculated or estimated amount of the pollutant (in TONS).
    "proc_controlapproachdesc", -- Description of the overall control system or approach applied to an emissions unit or process.
    "emissioncalculationmethod", -- Defines the method used to calculate emissions.
    "averagehoursperday", -- The average number of hours per day that the emissions process is active within the reporting period.
    "proc_identifier", -- An identifier by which an element is referred to in another system.
    "processcomment", -- Any comments regarding the emissions process.
    "sourceclassificationcode", -- EPA Source Classification code that identifies an emissions process. The description is given in the next field, processdescription.
    "stateandcountyfipscode", -- The list is from FIPS Counties codes used for the identification of the Counties and County equivalents of the United States.  The FIPS county code is a five-digit Federal Information Processing Standard (FIPS) code which uniquely identifies counties and county equivalents in the United States, certain U.S. possessions, and certain freely associated states. The first 2-digits are the State code and the last 3-digits are the county codes. 42001 Adams            42003 Allegheny        42005 Armstrong        42007 Beaver           42009 Bedford          42011 Berks            42013 Blair            42015 Bradford         42017 Bucks            42019 Butler           42021 Cambria          42023 Cameron          42025 Carbon           42027 Centre           42029 Chester          42031 Clarion          42033 Clearfield       42035 Clinton          42037 Columbia         42039 Crawford         42041 Cumberland       42043 Dauphin          42045 Delaware         42047 Elk              42049 Erie             42051 Fayette          42053 Forest           42055 Franklin         42057 Fulton           42059 Greene           42061 Huntingdon       42063 Indiana          42065 Jefferson        42067 Juniata          42069 Lackawanna       42071 Lancaster        42073 Lawrence         42075 Lebanon          42077 Lehigh           42079 Luzerne          42081 Lycoming         42083 McKean           42085 Mercer           42087 Mifflin          42089 Monroe           42091 Montgomery       42093 Montour          42095 Northampton      42097 Northumberland   42099 Perry            42101 Philadelphia     42103 Pike             42105 Potter           42107 Schuylkill       42109 Snyder           42111 Somerset         42113 Sullivan         42115 Susquehanna      42117 Tioga            42119 Union            42121 Venango          42123 Warren           42125 Washington       42127 Wayne            42129 Westmoreland     42131 Wyoming          42133 York            
    "reporting_pollutant", -- Code identifying the pollutant for which emissions are reported.
    "unitdescription", -- Text description of the emissions unit.
    "facilitysiteidentifier", -- A state's designated identifier by which the facility site is referred to by a system. The name of the facility is in the Facility Site Name  column.
    "facilitysiteid", -- Identifiers by which the facility site is known or has been know, and the system associated with the identifier
    "percentfallactivity", -- The percentage of the annual activity that occurred during the Fall months (September, October, November).
    "percentsummeractivity", -- The percentage of the annual activity that occurred during the Summer months (June, July, August).
    "denominatorunitofmeasurecode", -- The denominator for the unit of measure of the reported emission factor  (in TONS).
    "processdescription", -- A text description of the emissions process.
    "emis_unit_proc_id", -- A designator used to uniquely identify an emissions process.
    "unit_effectivedate", -- The date on which the identifier became effective.
    "sp_cl_parametercomment", -- Any comments regarding the parameter. Null - TON,  Code values are as follows: E6FT3S: MILLION STANDARD CUBIC FEET,  E3GAL: 1000 GALLONS, TON:TONS,  E6BTU:MILLION BTUS
    "averagedaysperweek", -- The average number of days per week that the emissions process is active within the reporting period.
    "percentwinteractivity", -- The percentage of the annual activity that occurred during the Winter months (December, January, February).
    "averageweeksperperiod", -- The average number of weeks that the emissions process is active within the reporting period.
    "emis_unit_proc_rpt_prd_id", -- An identifier identifying the group of the next 13 columns.
    "sp_cl_parametertype", -- Name of the parameter that describes the type of activity, throughput or input used in the calculation.
    "calculationmaterial", -- Name of the material or fuel processed.
    "actualhoursperperiod", -- Actual number of hours the process is active or operating during for the reporting period.
    "reductionefficiency", -- The percent reduction achieved for the pollutant when all control measures are operating as designed.
    "calculationparametervalue", -- Activity or throughput of the process for a given time period.
    "period_enddate", -- The date on which the reporting period ended.
    "controlmeasure", -- Identifies the piece of equipment or practice that is used to reduce one or more pollutants. 11 Single Cyclone 13 Multiple Cyclone 14 Centrifugal Separator 21 Spray Chamber 22 Cyclone Scrubber 23 Orfice Scrubber 24 Mechanical Scrubber 25 Venturi Scrubber 26 Packed Tower 27 Wet Filters 28 Water Curtain 29 Water Spray 31 Baghouse - Manual Cleaning 32 Baghouse - Mechanical Shakers 33 Baghouse - Pneumatic Shakers 34 Baghouse - Bag Collapse 35 Baghouse - Sonic Cleaning 36 Baghouse - Reverse Air Flow 37 Baghouse - Reverse Air Jets 39 Baghouse - Unknown Cleaning Mechanism 41 Single Stage Electrostatic Precipitator 42 Two Stage Electrostatic Precipitator 51 Settling Chamber 52 Impingement Separator 53 Panel Filters 54 Enclosure 55 Gravel Bed Filter 56 Annular Ring Filter 57 Demister 58 Continuous Medium Filter 61 Direct Flame Incin. w/o heat exchange 62 Catalytic Incinerator w/o heat exchange 63 Direct Flame Incin. with heat exchange 64 Catalytic Incinerator with heat exchange 71 Fixed Bed Adsorber 72 Continuous Adsorber 81 Surface Condenser 82 Contact Condenser 91 Packed Tower Adsorber 92 Plate or Tray Tower Adsorber 93 Spray Tower Adsorber 94 Spray Chamber Adsorber 95 Venturi Chamber Adsorber 96 Impingement Scrubber 97 Adsorption Column 101 Selective Catalytic Reduction 102 Selective Non-Catalytic Reduction 103 Non-Selective Catalytic Reduction 104 Oxidation Catalyst 105 Low NOx Burners 106 Water Injection 107 Steam Injection 108 Flue Gas Recirculation (FGR) 111 Heat Recovery Steam Generator (HRSG)
    "sp_cl_parameternumeratorumc", -- The numerator unit of measure for the parameter.  When this is null, then Parameter Value is a percentage as described in the Parameter Comment. Code values are as follows: E6FT3S: MILLION STANDARD CUBIC FEET,  E3GAL: 1000 GALLONS, TON:TONS,  E6BTU:MILLION BTUS
    "controlapproachdescription", -- Identifies the overall control system or approach, including capture effectiveness, where applied at an emissions unit to reduce the amount of pollutants released into the environment.
    "period_startdate", -- The date on which the reporting period began.  Applies to the reporting of episodic or event emissions only.
    "emis_unit_id", -- Identifiers by which the emissions unit is known or has been known.
    "averagepercentemissions", -- The average annual percent of an emissions process that is vented through a release point.
    "release_point_desc", -- Identifiers by which the emissions release point is known or has been known.
    "emissionsyear", -- Year of the emission (discharge of criteria air pollutants (CAPs) and hazardous air pollutants (HAPs))
    "effectivedate", -- Date that the Air Quality primary facility was created in eFACTS.
    ":@computed_region_rayf_jjgk",
    "longitudemeasure", -- The measure of the angular distance on a meridian east or est of the prime meridian. 
    "ems_unt_prc_rpt_pr_sp_cl", -- An identifier identifying the group of the next 5 columns.
    ":@computed_region_r6rf_p9et",
    "latitudemeasure", -- The measure of the angular distance on a meridian north or south of the equator. 
    ":@computed_region_amqz_jbr4",
    "locationaddresspostalcode", -- The code the represents a U.S. Zip code or International postal code. 
    "locationcountycode", -- State's County Code. For Pennsylvania the county codes are from 01 - 67 in Alphabetical order.
    "sp_cl_parameterdenominatorumc" -- The denominator unit of measure for the parameter.  When this is null, then Parameter Value is a percentage as described in the Parameter Comment. Code values are as follows: E6FT3S: MILLION STANDARD CUBIC FEET,  E3GAL: 1000 GALLONS, TON:TONS,  E6BTU:MILLION BTUS
FROM
    "pa-gov/emissions-inventory-system-eis-emissions-2017-p7sf-rbb5:latest"."emissions_inventory_system_eis_emissions_2017"
LIMIT 100;

Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query pa-gov/emissions-inventory-system-eis-emissions-2017-p7sf-rbb5 with SQL in under 60 seconds.

This repository is an "external" repository. That means it's hosted elsewhere, in this case at data.pa.gov. When you querypa-gov/emissions-inventory-system-eis-emissions-2017-p7sf-rbb5:latest on the DDN, we "mount" the repository using the socrata mount handler. The mount handler proxies your SQL query to the upstream data source, translating it from SQL to the relevant language (in this case SoQL).

We also cache query responses on the DDN, but we run the DDN on multiple nodes so a CACHE_HIT is only guaranteed for subsequent queries that land on the same node.

Query Your Local Engine

Install Splitgraph Locally
bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"
 

Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.

It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgrcan manage the image, container and volume for you.

There are a few ways to ingest data into the local engine.

For external repositories (like this repository), the Splitgraph Engine can "mount" upstream data sources by using sgr mount. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.

For hosted datasets, where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr cloneand sgr checkout.

Mounting Data

This repository is an external repository. It's not hosted by Splitgraph. It is hosted by data.pa.gov, and Splitgraph indexes it. This means it is not an actual Splitgraph image, so you cannot use sgr clone to get the data. Instead, you can use the socrata adapter with the sgr mount command. Then, if you want, you can import the data and turn it into a Splitgraph image that others can clone.

First, install Splitgraph if you haven't already.

Mount the table with sgr mount

sgr mount socrata \
  "pa-gov/emissions-inventory-system-eis-emissions-2017-p7sf-rbb5" \
  --handler-options '{
    "domain": "data.pa.gov",
    "tables": {
        "emissions_inventory_system_eis_emissions_2017": "p7sf-rbb5"
    }
}'

That's it! Now you can query the data in the mounted table like any other Postgres table.

Query the data with your existing tools

Once you've loaded the data into your local Splitgraph engine, you can query it with any of your existing tools. As far as they're concerned, pa-gov/emissions-inventory-system-eis-emissions-2017-p7sf-rbb5 is just another Postgres schema.

Related Documentation: