Query the Data Delivery Network
Query the DDNThe easiest way to query any data on Splitgraph is via the "Data Delivery Network" (DDN). The DDN is a single endpoint that speaks the PostgreSQL wire protocol. Any Splitgraph user can connect to it at data.splitgraph.com:5432
and query any version of over 40,000 datasets that are hosted or proxied by Splitgraph.
For example, you can query the lobbyist_statement_of_registration_beginning_2019
table in this repository, by referencing it like:
"ny-gov/lobbyist-statement-of-registration-beginning-2019-se5j-cmbb:latest"."lobbyist_statement_of_registration_beginning_2019"
or in a full query, like:
SELECT
":id", -- Socrata column ID
"compensation_type", -- Output selections include: Annually, Bi-weekly, Daily, Hourly, Monthly, One-time, Quarterly, Weekly, or Range. Data exception: If ‘Range’ is selected as ‘Compensation Type’ data will display as two compensation amounts separated by a hypen. Example: 5000-6000 Only applicable if ‘Reportable Compensation (and Expenses) is selected as the ‘Description of the Agreement’.
"lobbying_subjects", -- Filer selected information via a drop-down menu. Description of the general subject/issues expected to be lobbied on during the biennial period. Examples: Education – General; Energy & Natural Resources – Waste Management; Health – Medicine/ Medicaid.
"contract_end_date", -- Date formatted mm/dd/yyyy. The date the lobbying agreement between the Principal Lobbyist and Contractual is expected to end. The date all lobbying activity ceases between the two parties.
"contract_start_date", -- Date formatted mm/dd/yyyy. First date the Prinicpal Lobbyist is authorized to lobby on behalf of the Contractual Client.
"compensation_amount", -- Dollar amount. Only applicable if ‘Reportable Compensation (and Expenses) is selected as the ‘Description of the Agreement’. The amount of compensation expected to be paid to the Principal Lobbyist by the Contractual Client for lobbying services as reported in the lobbying agreement.
"individual_lobbyist_s", -- The first and last name of the individual(s) who personally engages in Direct or Grassroots Lobbying. Generally, these individuals either work for or are designated by the Principal Lobbyist and can include individuals lobbying on their own behalf.
"unique_id", -- This information is used for internal and technical data identification purposes only. It is not part of the filing information reported by the filer.
"reporting_year", -- A Principal Lobbyist submits a Lobbyist Statement of Registration for a biennial (two-year) period. Example: 2021-2022, 2023-2024.
"rbr_submitted", -- Output options are: Yes or No. Reportable Business Relationship is included as part of the Lobbyist Statement of Registration; however, data related to the disclosure has been isolated to a separate dataset.
"type_of_lobbying_relationship", -- Output options are: Employed or Retained. This describes the Type of Lobbying Relationship between the Principal Lobbyist and the Contractual Client. • If Employed is selected, the Contractual Client and Principal Lobbyist are the same. An Employed designation is when an organization is lobbying on their own behalf. • If Retained is selected, a lobbying organization has been engaged to lobby for the benefit of an unaffiliated Contractual Client.
"type_of_lobbying_communication", -- Output options are: Direct Lobbying, Grassroots Lobbying, Both (Direct and Grassroots). Distinguishes how attempts to influence the specific Lobbying Focus is expected to be conducted. If ‘Monitoring Only’ is selected, no Parties Lobbied will be affiliated to that specific Lobbying Focus (Lobbying Activity).
"principal_lobbyist_name", -- Organization or individual name. An individual or organization that is retained to lobby for a client, or that is lobbying on its own behalf. The Principal Lobbyist is registered in LA for a biennial (two-year) period and is required to submit Bi-Monthly reports detailing the lobbying activities that occurred during each Bi-Monthly Reporting period.
"form_submission_id", -- Form Identifying number to help users distinguish how many rows of information are attributatble to one distinct filing.
"filing_type", -- Output options are: ‘Original’; or ‘Amendment’. Distinguishes between an original filing or an amended filing.
"contractual_client_name", -- Organization or individual name. The name of the individual or organization that retains the services of a Principal Lobbyist for the benefit of itself or another.
"beneficial_client_name", -- Organization or individual name. The specific individual or organization on whose behalf and at whose request or behest Lobbying Activity is conducted. Beneficial Clients are generally referred to as the ‘true client’ as they are directly benefiting from the lobbying activity. A Beneficial Client and Contractual Client can be, and often are, the same.
"co_lobbyist", -- Organization or individual name. All Lobbyists who are retained by a Client on the same single retainer agreement or contract are Co-Lobbyists, and must file individual Lobbying reports with the Commission.
"sub_lobbyist", -- Organization or individual name. A Sub-Lobbyist is a Lobbyist who is engaged to perform services by another Lobbyist, as part of an agreement between the main Lobbyist and a Contractual Client. There are separate retainer agreements or contracts: one between the main Lobbyist and Client, and one between the main Lobbyist and Sub Lobbyist to do a portion of the lobbying that the main Lobbyist has agreed to perform on behalf of the Client.
"level_of_government", -- Output options are: State, Municipal, or Both (State and Municipal).
"description_of_agreement", -- Output options are: • Anticipate the $5,000 threshold will be exceeded; • Do not anticipate exceeding the $5,000 threshold; and • Pro Bono Lobbying Contract/Authorization. Every Statement of Registration requires a lobbying agreement/authorization (or contract) be included that outlines the terms of the lobbying relationship between the Principal Lobbyist and the Contract Client.
"reportable_compensation", -- Radio button selection related to the ‘Description of Agreement’. • If ‘Anticipate the $5,000 threshold will be exceeded’ is selected, output options are: Reportable Expenses Only; or Reportable Compensation (and Expenses). • If ‘Do not anticipate exceeding the $5,000 threshold’ is selected, output options are: Reportable Expenses Only; or Reportable Compensation (and Expenses). • If ‘Pro Bono Lobbying Contract/Authroization’ is selected, no reportable compensation and expenses is included.
"party_name", -- The ‘targets’ of lobbying activity. The people and/or entities Filers are attempting to influence. Parties Lobbied are selected from a drop-down list . The LA Database includes a list of ‘Party Names’ that, if selected, do not require any additional information from the Filer. This includes, but is not limited to, the names of every State Legislator and Legislative Committee (and ‘staff member’ of every such Legislator and Committee), the Governor, Lieutenant Governor, and Executive Chamber, and New York City Council Members
"government_body", -- Every ‘Party’ listed in the LA Database contains two components: the ‘Government Body’ and the main ‘Party’ name. Every ‘Party’ name is categorized into specific Government Bodies. Below is a list of all government bodies represented in LA available for the Registration. LA will automatically identify the Government Body associated to a Party Name. Filers may be required to identify additional information related to a selected Party Name. Output options are: 1. Senate/Assembly/Executive: a. Not Known at this Time; 2. Senate Committee; 3. Assembly Committee; 4. City; 5. County; 6. Executive Chamber; 7. Industrial Development Agency; 8. NYS Assembly; 9. NYS Indian Nations; 10. NYS Legislature; 11. NYS School Districts; 12. NYS Senate; 13. Senate Committee; 14. State Agency; 15. State and Local Public Authorities and Local Development Corporations; 16. Town ; 17. Village Improvement/Special Districts: a. County Special District; b. Town Special District; c. Consolidated Health District; d. Fire District; e. Independent Special District.
"focus_identifying_number", -- May include either an identifying number or description of the Lobbying Focus.
"lobbying_focus_type" -- Lobbying Activities include the Focus Type, an identifying number or description, and the targets of the lobbying for each Lobbying Activity reported. Lobbying Focuses are divided into State or Municipal Activities. State Lobbying Focuses: • State Bill; • State Executive Order; • State Funding; • State Land Use; • State Permits/Licensing; • State Procurement; • State Regulation/Rate-making/Rule; • State Resolution; • State Tribal Compact Agreement – NYS Indian Nations. Municipal Lobbying Focuses: • Municipal Bill; • Municipal Executive Order; • Municipal Funding; • Municipal Land Use; • Municipal Ordinance; • Municipal Permits/Licensing; • Municipal Procurement; • Municipal Rule/Regulation/Ratemaking/Rule; • Municipal Resolution. A filer also has the option to select ‘Unknown’ for the Lobbying focus; however, a description must be included.
FROM
"ny-gov/lobbyist-statement-of-registration-beginning-2019-se5j-cmbb:latest"."lobbyist_statement_of_registration_beginning_2019"
LIMIT 100;
Connecting to the DDN is easy. All you need is an existing SQL client that can connect to Postgres. As long as you have a SQL client ready, you'll be able to query ny-gov/lobbyist-statement-of-registration-beginning-2019-se5j-cmbb
with SQL in under 60 seconds.
This repository is an "external" repository. That means it's hosted elsewhere, in this case at data.ny.gov. When you queryny-gov/lobbyist-statement-of-registration-beginning-2019-se5j-cmbb:latest
on the DDN, we "mount" the repository using the socrata
mount handler. The mount handler proxies your SQL query to the upstream data source, translating it from SQL to the relevant language (in this case SoQL).
We also cache query responses on the DDN, but we run the DDN on multiple nodes so a CACHE_HIT
is only guaranteed for subsequent queries that land on the same node.
Query Your Local Engine
bash -c "$(curl -sL https://github.com/splitgraph/splitgraph/releases/latest/download/install.sh)"
Read the installation docs.
Splitgraph Cloud is built around Splitgraph Core (GitHub), which includes a local Splitgraph Engine packaged as a Docker image. Splitgraph Cloud is basically a scaled-up version of that local Engine. When you query the Data Delivery Network or the REST API, we mount the relevant datasets in an Engine on our servers and execute your query on it.
It's possible to run this engine locally. You'll need a Mac, Windows or Linux system to install sgr
, and a Docker installation to run the engine. You don't need to know how to actually use Docker; sgr
can manage the image, container and volume for you.
There are a few ways to ingest data into the local engine.
For external repositories (like this repository), the Splitgraph Engine can "mount" upstream data sources by using sgr mount
. This feature is built around Postgres Foreign Data Wrappers (FDW). You can write custom "mount handlers" for any upstream data source. For an example, we blogged about making a custom mount handler for HackerNews stories.
For hosted datasets, where the author has pushed Splitgraph Images to the repository, you can "clone" and/or "checkout" the data using sgr clone
and sgr checkout
.
Mounting Data
This repository is an external repository. It's not hosted by Splitgraph. It is hosted by data.ny.gov, and Splitgraph indexes it. This means it is not an actual Splitgraph image, so you cannot use sgr clone
to get the data. Instead, you can use the socrata
adapter with the sgr mount
command. Then, if you want, you can import the data and turn it into a Splitgraph image that others can clone.
First, install Splitgraph if you haven't already.
Mount the table with sgr mount
sgr mount socrata \
"ny-gov/lobbyist-statement-of-registration-beginning-2019-se5j-cmbb" \
--handler-options '{
"domain": "data.ny.gov",
"tables": {
"lobbyist_statement_of_registration_beginning_2019": "se5j-cmbb"
}
}'
That's it! Now you can query the data in the mounted table like any other Postgres table.
Query the data with your existing tools
Once you've loaded the data into your local Splitgraph engine, you can query it with any of your existing tools. As far as they're concerned, ny-gov/lobbyist-statement-of-registration-beginning-2019-se5j-cmbb
is just another Postgres schema.