Query the Data Delivery Network
Query the DDNThe easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire protocol. Any Splitgraph user can connect to it at data.splitgraph.com:5432
and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.
For example, you can query the road_weather_demonstration_data
table in this repository, by referencing it like:
"datahub-transportation-gov/road-weather-demonstration-data-jc5e-psbx:latest"."road_weather_demonstration_data"
or in a full query, like:
SELECT
":id", -- Socrata column ID
"flag_11", -- This field contains the flag for a quality check for each data record. The SourceId field determines which quality check is depicted in this field. For SourceId = 1, this field depicts the Sea Level Pressure Test. This test is an application of the Barnes Spatial Test to sea level pressure estimates computed by correcting barometric pressure measurements for elevation and relies on daily RADIOSONDE pressure reports as reference. For SourceId = 2, this field depicts the Time-step which verifies that the change between an observation and previous observations from the same sensor is less than the maximum rate of change allowed for the sensor hardware that produced the observations. Quality checking results are interpreted from the output as follows: "-" - The quality check was not run for the observation, usually due to insufficient background field data. "/" - The quality check was not configured to run for the observation type.
"flag_12", -- This field contains the flag for a quality check for each data record. The SourceId field determines which quality check is depicted in this field. For SourceId = 1, this field depicts the Precipitation Accumulation Test. This test verifies that the precipitation accumulation values in the same geographic area are close to each other. For SourceId = 2, this field depicts the Overall Dew Temperature which is an application of the Barnes Spatial Test to dew point temperature values that are computed from air temperature and relative humidity values on the same platform. Quality checking results are interpreted from the output as follows: "/" - The quality check was not configured to run for the observation type.
"flag_10", -- This field contains the flag for a quality check for each data record. The SourceId field determines which quality check is depicted in this field. For SourceId = 1, this field depicts the Dew Point Test. This test is an application of the Barnes Spatial Test to dew point temperature values that are computed from air temperature and relative humidity values on the same platform. For SourceId = 2, this field depicts the Spatial Test – Barnes and Interquartile Range (IQR) which are quality checks for air and pavement temperature and pressure leverage work performed for the WxDE specifically for spatial quality checks. Observations are compared to the same observation type from nearby platforms. Nearby platforms are currently defined as being within a 69-mile radius and 5-minute observation time from the data point being quality checked. Where there are more than 5 observations within a 69-mile radius and 5-minute observation time, the IQR is used as a more robust method for spatial quality check. In the instance where there are fewer observations than required for an IQR, the Barnes Spatial quality check is used instead. If the observation type is for atmospheric pressure, the observation is adjusted for sea level for comparison. Quality checking results are interpreted from the output as follows: "P" - The observation passed the quality check. "N" - The observation did not pass the quality check. "-" - The quality check was not run for the observation, usually due to insufficient background field data. "/" - The quality check was not configured to run for the observation type.
"englishunits", -- This field contains the name of the English unit for the observation.
"contributor", -- This field is the name of the data contributor for this record. For the concerned data, this field contains only one value: “MI_State_DOT” since MDOT was the sole contributor for this data.
"flag_13", -- This field contains the flag for a quality check for each data record. This field is applicable only when SourceId = 2. This field depicts the Data Filtering Test. In special conditions, such as tunnels, data may be misleading (e.g., headlights and wiper status may differ in a tunnel). At present, it is used to monitor latitude and longitude values. Quality checking results are interpreted from the output as follows: "P" - The observation passed the quality check.
"timestamp", -- This field contains the UTC date and time at which the observation was recorded at the source sensor.
"latitude", -- This field is the latitude value of the location at which the recording sensor recorded the observation.
"flag_9", -- This field contains the flag for a quality check for each data record. The SourceId field determines which quality check is depicted in this field. For SourceId = 1, this field depicts the Barnes Spatial Test. This test verifies that observations from similar sensors on platforms in the same geographic area (within 69 miles distance and 5 minutes time) are close to each other. For SourceId = 2, this field depicts the Spatial Test – Barnes and Interquartile Range (IQR) which are quality checks for air and pavement temperature and pressure leverage work performed for the WxDE specifically for spatial quality checks. Observations are compared to the same observation type from nearby platforms. Nearby platforms are currently defined as being within a 69-mile radius and 5-minute observation time from the data point being quality checked. Where there are more than 5 observations within a 69-mile radius and 5-minute observation time, the IQR is used as a more robust method for spatial quality check. In the instance where there are fewer observations than required for an IQR, the Barnes Spatial quality check is used instead. If the observation type is for atmospheric pressure, the observation is adjusted for sea level for comparison. Quality checking results are interpreted from the output as follows: "P" - The observation passed the quality check. "N" - The observation did not pass the quality check. "-" - The quality check was not run for the observation, usually due to insufficient background field data. "/" - The quality check was not configured to run for the observation type.
"flag_8", -- This field contains the flag for a quality check for each data record. The SourceId field determines which quality check is depicted in this field. For SourceId = 1, this field depicts the IQR Spatial Test. This test is an alternative to the Barnes Spatial Test that verifies that observations from similar sensors in the same geographic area are close to each other. It only runs when there are five or more observations available. For SourceId = 2, this field depicts the Standard Deviation which checks whether the observation is within 2.5 standard deviation of the mean value on the road segment. Quality checking results are interpreted from the output as follows: "P" - The observation passed the quality check. "N" - The observation did not pass the quality check. "-" - The quality check was not run for the observation, usually due to insufficient background field data. "/" - The quality check was not configured to run for the observation type.
"category", -- This field defines the category of the associated platform: P stands for permanent, T for transportable, M for mobile. For the concerned data set, this field contains only one value: “M”, since all the data was collected from mobile platforms (vehicles) which are considered mobile environmental sensor stations as defined in the NTCIP 1204: Environmental Sensor Station Interface Standard.
"location_state",
"flag_7", -- This field contains the flag for a quality check for each data record. The SourceId field determines which quality check is depicted in this field. For SourceId = 1, this field depicts the Persistence Test. This test verifies that the time-series of observations from a single sensor are not constant, which would indicate that the sensor has failed and is repeatedly reporting the same value. For SourceId = 2, this field depicts the Sensor Range Test which verifies that an observation is within the hardware capability range for which the sensor used to generate the observation can provide accurate data. It is only applied when information about the valid range of observations from a sensor is known. Quality checking results are interpreted from the output as follows: "P" - The observation passed the quality check. "N" - The observation did not pass the quality check. "-" - The quality check was not run for the observation, usually due to insufficient background field data.
"confvalue", -- This field contains the quality check confidence value. This field is a number that is representative of the quality check performed on the data record across multiple criteria. An algorithm is run to evaluate the results of all the quality checking. If the quality checking results indicate all passed, then a value of 1 is assigned. If no quality checking passes, then a value of 0 is assigned. If only some of the results pass, then a value between 0 and 1 is assigned.
"siteid", -- This field is the identifier of the site at which the associated platform is located. The site is the physical location to which the corresponding platform (vehicle) is assigned. For this data file, the SiteID field contains only one value: “5710” as only a single location was used for data collection demonstration.
"platformid", -- This field is the unique identifier of the platform (vehicle or station) to which the sensor recording the observation is associated. The referred platform is the vehicle on which the associated sensor was mounted during the data recording. There were no stationary platforms involved during the collection of this data.
"flag_6", -- This field contains the flag for a quality check for each data record. The SourceId field determines which quality check is depicted in this field. For SourceId = 1, this field depicts the Like Instrument Test. This test verifies that observations from similar sensors on the same platform are within tolerance to each other. For SourceId = 2, this field depicts the Persistence Test which detects whether sensor readings remain constant for a predefined period of time, which varies depending on the observation type. Consecutive sensor readings from the same sensor over the defined period preceding the current observation time are obtained. If one or more of the consecutive sensor values are different, the current sensor reading passes the persistence quality check. If all consecutive sensor values over the given time period are equivalent, the current sensor does not pass the persistence quality check. Quality checking results are interpreted from the output as follows: "-" - The quality check was not run for the observation, usually due to insufficient background field data. "/" - The quality check was not configured to run for the observation type.
"longitude", -- This field is the longitude value of the location at which the recording sensor recorded the observation.
"sensorindex", -- This field contains the index of the sensor which recorded that observation. For example, if there were two external sensors on the vehicle recording surface temperature, then one of them would have a sensorIndex of “0” and the other one would have a sensorIndex of “1”.
"flag_2", -- This field contains the flag for a quality check for each data record. The SourceId field determines which quality check is depicted in this field. For SourceId = 1, this field depicts the Manual test which indicates an observation that the contributor does not want the end user to trust. For this SourceId, this field contains only one value: “-“. For SourceId = 2, this field depicts the Climate Range Test which detects sensor readings that fall outside predetermined climate range values. The climate range data were drawn from 30 years of National Centers for Environmental Prediction - Department of Energy (NCEP-DOE) Reanalysis 2 data. Bounds for the climate range test were determined by computing monthly minimum and maximum values over a two-and-a-half degree by two-and-a-half degree fixed latitude-longitude grid. Quality checking results are interpreted from the output as follows: "P" - The observation passed the quality check. "N" - The observation did not pass the quality check. "-" - The quality check was not run for the observation, usually due to insufficient background field data. "/" - The quality check was not configured to run for the observation type.
"flag_5", -- This field contains the flag for a quality check for each data record. The SourceId field determines which quality check is depicted in this field. For SourceId = 1, this field depicts the Step Test which verifies that the change between an observation and previous observations from the same sensor is less than the maximum rate of change allowed for the sensor hardware that produced the observations. For SourceId = 2, this field depicts the Neighboring Vehicle Test. The Neighboring Vehicle Test compares the given vehicle observation to neighboring vehicles in the road segment. Specifically, the standard deviation and the mean of the observations along a 1-mile road segment during a 5-minute snapshot are taken, and then each observation is checked if it falls within a standard deviation, multiplied by a constant, of the mean of the road segment. Quality checking results are interpreted from the output as follows: "P" - The observation passed the quality check. "N" - The observation did not pass the quality check. "-" - The quality check was not run for the observation, usually due to insufficient background field data. "/" - The quality check was not configured to run for the observation type.
"flag_1", -- This field contains the flag for a quality check for each data record. The SourceId field determines which quality check is depicted in this field. For SourceId = 1, this field depicts the Complete Test which test indicates whether that all of the WxDE quality check calculations configured for each observation type are done with their work. For SourceId = 2, this field depicts the Combined Algorithm Test which is designed to take the results of all the previous quality check tests and combine them in order to assign a confidence to the observation that is being quality checked. It does not equate directly to pass/fail as do the other tests. Sensor Range Test. The sensor range test detects readings that fall outside the range of sensor hardware specifications or theoretical limits (i.e., a maximum and minimum value). If the observation value is greater than or equal to the minimum, and less than or equal to the maximum, the sensor reading passes this quality check. Quality checking results are interpreted from the output as follows: "P" - The observation passed the quality check. "N" - The observation did not pass the quality check. "/" - The quality check was not configured to run for the observation type.
"sensorid", -- This field contains the unique identifier of the sensor which recorded that observation. This field has a total of 104 different and unique identifiers which do not follow any given continuum.
"elevation", -- This field is the elevation value (in meters) of the location at which the recording sensor recorded the observation.
"obstypename", -- This field contains a text descriptor for the type of observation made in that record. For example, observations of air temperature, atmospheric pressure, wiper status, traction control state would be accompanied by corresponding text descriptors in ObsTypeName field. (see Appendix A)
"obstypeid", -- This field identifies the identification number, per the WxDE definitions, of the type of the observation made in that record. The identifier and the corresponding description of observation type is as follows: 554 - ESS Atmospheric Pressure, 575 – ESS Dew-point Temperature, 581 – ESS Relative Humidity, 5733 – ESS Air Temperature, 51138 – ESS Surface Temperature, 511319 – Pavement Sensor temperature Depth, 2000001 – Can Wiper Status, 2000002 – Can Anti-lock Brake Status, 2000004 – Can Brake Applied Status, 2000005 – Can Heading, 2000008 – Can Speed, 2000009 – Can Stability Control Status, 2000012 – Can Traction Control State, 2001179 – Can Atmospheric Pressure. (see Appendix A)
"location_zip",
"location_address",
":@computed_region_28hd_vqqn",
"observation", -- This field contains the recorded data value for that observation in metric units.
"location_city",
"flag_4", -- This field contains the flag for a quality check for each data record. The SourceId field determines which quality check is depicted in this field. For SourceId = 1, this field depicts the Climate Range Test which uses a 30-year climate record to verify that an observation is within a range of values expected for the location at which the observation was made at the time of year when the observation was made. For SourceId = 2, this field depicts the Nearest Surface Station. This quality check is not relevant any more. It has been updated/improved into Spatial Test (Barnes and/or IQR) – see columns Flag 9 and Flag 10. Quality checking results are interpreted from the output as follows: "P" - The observation passed the quality check. "N" - The observation did not pass the quality check. "-" - The quality check was not run for the observation, usually due to insufficient background field data. "/" - The quality check was not configured to run for the observation type.
"location", -- Column to generate visualization.
"englishvalue", -- This field contains the recorded data value for that observation in English unit.
"contribid", -- This field is the unique identifier, per the WxDE definitions, of the data contributor for this record. For the concerned data, this field contains only one value: ‘26” since MDOT was the sole contributor for this data.
"platformcode", -- This field is the unique identifier of the platform used for data recording. This field is provided by the data contributor (MDOT) and is unique to each vehicle. MDOT names their vehicles based on the cell phone number in the vehicle and/or a partial Vehicle Identification Number (VIN). This field has a one-to-one correspondence with the PlatformID.
"flag_3", -- This field contains the flag for a quality check for each data record. The SourceId field determines which quality check is depicted in this field. For SourceId = 1, this field depicts the Sensor Range Test which verifies that an observation is within the hardware capability range for which the sensor used to generate the observation can provide accurate data. It is only applied when information about the valid range of observations from a sensor is known. For SourceId = 2, this field depicts the Model Analysis Test which compares the temperature and pressure observations from the vehicles to those of a numerical weather model analysis field for the closest grid point. Quality checking results are interpreted from the output as follows: "P" - The observation passed the quality check. "N" - The observation did not pass the quality check. "/" - The quality check was not configured to run for the observation type.
"sourceid", -- This field is the identifier of the source system that was used for quality checking. The value is “1” if the quality checking was performed using the WxDE (Weather Data Environment), and the value is “2” if the quality checking was performed using the VDT (Vehicle Data Translator).
"units" -- This field contains the name of the metric unit for the observation.
FROM
"datahub-transportation-gov/road-weather-demonstration-data-jc5e-psbx:latest"."road_weather_demonstration_data"
LIMIT 100;
Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query datahub-transportation-gov/road-weather-demonstration-data-jc5e-psbx
with SQL in under 60 seconds.
Query Your Local Engine
bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"
Read the installation docs.
Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.
It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr
, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgr
can manage the image, container and volume for you.
There are a few ways to ingest data into the local engine.
For external repositories, the Splitgraph Engine can "mount" upstream data sources by using sgr mount
. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.
For hosted datasets (like this repository), where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr clone
and sgr checkout
.
Cloning Data
Because datahub-transportation-gov/road-weather-demonstration-data-jc5e-psbx:latest
is a Splitgraph Image, you can clone the data from Spltgraph Cloud to your local engine, where you can query it like any other Postgres database, using any of your existing tools.
First, install Splitgraph if you haven't already.
Clone the metadata with sgr clone
This will be quick, and does not download the actual data.
sgr clone datahub-transportation-gov/road-weather-demonstration-data-jc5e-psbx
Checkout the data
Once you've cloned the data, you need to "checkout" the tag that you want. For example, to checkout the latest
tag:
sgr checkout datahub-transportation-gov/road-weather-demonstration-data-jc5e-psbx:latest
This will download all the objects for the latest
tag of datahub-transportation-gov/road-weather-demonstration-data-jc5e-psbx
and load them into the Splitgraph Engine. Depending on your connection speed and the size of the data, you will need to wait for the checkout to complete. Once it's complete, you will be able to query the data like you would any other Postgres database.
Alternatively, use "layered checkout" to avoid downloading all the data
The data in datahub-transportation-gov/road-weather-demonstration-data-jc5e-psbx:latest
is 0 bytes. If this is too big to download all at once, or perhaps you only need to query a subset of it, you can use a layered checkout.:
sgr checkout --layered datahub-transportation-gov/road-weather-demonstration-data-jc5e-psbx:latest
This will not download all the data, but it will create a schema comprised of foreign tables, that you can query as you would any other data. Splitgraph will lazily download the required objects as you query the data. In some cases, this might be faster or more efficient than a regular checkout.
Read the layered querying documentation to learn about when and why you might want to use layered queries.
Query the data with your existing tools
Once you've loaded the data into your local Splitgraph Engine, you can query it with any of your existing tools. As far as they're concerned, datahub-transportation-gov/road-weather-demonstration-data-jc5e-psbx
is just another Postgres schema.