Data Delivery Network

Splitgraph Data Delivery Network (DDN) is a PostgreSQL-compatible SQL endpoint hosted at postgresql://data.splitgraph.com:5432/ddn that lets you query any dataset hosted on or indexed by Splitgraph with any PostgreSQL client.

In an SQL query, you can reference Splitgraph images as PostgreSQL schemas using the form namespace/repository:image_hash_or_tag. If the hash or tag is missing, Splitgraph will assume the :latest tag.

This means that these are equivalent:

SELECT * FROM "splitgraph/london_wards".city_merged_2018;
SELECT * FROM "splitgraph/london_wards:latest".city_merged_2018;
SELECT * FROM "splitgraph/london_wards:34802db1d71f69973d5e5845ab358b3e339077ed4f3f3fc790d2f89e87b4f433".city_merged_2018;

You can also run JOINs across multiple images or data sources. For example:

SELECT
    cambridge_cases.date AS date,
    chicago_cases.cases_total AS chicago_daily_cases,
    cambridge_cases.new_positive_cases AS cambridge_daily_cases
FROM
    "cityofchicago/covid19-daily-cases-deaths-and-hospitalizations-naz8-j4nc".covid19_daily_cases_deaths_and_hospitalizations chicago_cases
FULL OUTER JOIN
    "cambridgema-gov/covid19-cumulative-cases-by-date-tdt9-vq5y".covid19_cumulative_cases_by_date cambridge_cases
ON
    date_trunc('day', chicago_cases.lab_report_date) = cambridge_cases.date
ORDER BY date ASC;

Splitgraph SQL endpoint JOIN query example

We support most SQL clients and have explicitly tested DBeaver, pgcli, psql and VSCode.

We also support PostGIS, so querying PostGIS-enabled datasets will pass geographical data back to the client. In the case of DBeaver or pgAdmin, the client can even render the data on a map. For example:

SELECT name, gss_code, ST_Transform(ST_SetSRID(geom, 27700), 4326)
    FROM "splitgraph/london_wards".city_merged_2018

PostGIS data on Splitgraph SQL endpoint

Workspaces

Splitgraph's endpoint intercepts the client's introspection queries. When you log into the SQL endpoint with a client, its introspection will show you repositories that you own, as well as the repositories that are featured on Splitgraph. We call this feature a "workspace".

The workspace currently only shows a small subset of datasets available on Splitgraph: you can query any dataset on Splitgraph using SQL, not just the ones in the workspace.

Splitgraph workspace example

Connecting to the DDN

The endpoint is located at data.splitgraph.com. The connection parameters are:

  • Username: Splitgraph API key
  • Password: Splitgraph API secret
  • Host: data.splitgraph.com
  • Port: 5432
  • Database: ddn

With Splitgraph installed

If you have Splitgraph installed locally, the easiest way to query the endpoint is by running sgr cloud sql [query].

sgr cloud sql without arguments returns a libpq-compatible connection string. You can use it to connect to the endpoint with a client like pgcli or psql:

pgcli $(sgr cloud sql)
psql $(sgr cloud sql)

You need to be logged in to Splitgraph Cloud. See the Splitgraph Cloud introduction for more information.

You can find your Splitgraph API key and secret in your .sgconfig file, under the data.splitgraph.com section:

[data.splitgraph.com]
...
SG_ENGINE_USER=[API key]
SG_ENGINE_PWD=[API secret]

See the configuration reference for more information.

Without Splitgraph installed

You don't need to install anything to use the endpoint. Just go to splitgraph.com/connect to connect to the DDN. All you need to do is register or log into Splitgraph and get a SQL credential that you can plug into any SQL client.

SSL encryption and server certificate

We use mandatory SSL encryption on our endpoint (this is called sslmode=require in libpq terminology). Note that this protects you against eavesdropping, but not against MITM attacks, since PostgreSQL clients by default don't verify the server certificate (verify-ca or verify-full).

To enable full verification, you'll need access to data.splitgraph.com's root certificate. You can verify the certificate by running:

echo | openssl s_client -starttls postgres -connect data.splitgraph.com:5432 -showcerts

You can get the root certificate by running:

echo | openssl s_client -starttls postgres -connect data.splitgraph.com:5432  2>/dev/null | openssl x509 > ~/.splitgraph/data.splitgraph.com.crt

On pre-1.1.1 versions of s_client that don't support PostgreSQL, you can get data.splitgraph.com's certificate from HTTPS, since it presents the same certificate there:

echo | openssl s_client -connect data.splitgraph.com:443  2>/dev/null | openssl x509

You can then write the output into a file and give that file to clients like Google Data Studio that require SSL certificates.

Restrictions

We currently explicitly limit all queries to 10,000 rows and 30 seconds of execution. We might review these restrictions in the future.

We filter queries and only allow you to run SELECT and EXPLAIN statements, as well as some utility statements like SHOW and SET used by some SQL clients.

Client support and notes

DBeaver, pgcli, psql, VSCode

We support introspection hooking and querying for these tools and have explicitly tested them.

Google Data Studio

You will need the server's root certificate to set up the connection. You will also need a client certificate pair: this can be self-signed.

openssl req -newkey rsa:2048 -nodes -keyout client.key -x509 -days 365 -out client.crt

Note that there's a bug in the Data Studio which means that you have to upload your key to the client certificate field and the certificate to the key field.

Data discovery doesn't completely work on Google Data Studio, as it treats all tables as being in the public schema and doesn't schema-qualify them when making queries from the table picker. You can still use a custom query to import data into Google Data Studio.

Metabase

Data discovery (introspection query interception) doesn't work with Metabase, as our information_schema implementation returns slashes in schema names. See the GitHub issue for details.

SQL queries work, allowing you to use Splitgraph data in your Metabase dashboards.

DataGrip

Introspection query interception doesn't work on DataGrip (you will see an empty tree in the sidebar) but you can still run all SQL queries.

Roadmap

We aim to make the SQL endpoint the core of Splitgraph's offering, moving us to being a "data delivery network": a single endpoint giving you access to all your data with extra services like caching, granular access control, auditing, query firewalling or performance optimization.

The rough list of planned features for Splitgraph's SQL endpoint is currently:

  • Ability to add backend data sources to Splitgraph as "external repositories" on public Splitgraph or when deploying Splitgraph on-premises
  • Custom workspaces (for example, connecting to postgresql://user:pass@data.splitgraph.com:5432/workspace-1), letting you bookmark datasets to show up in your client
  • Scheduled snapshots of queries as Splitgraph images
  • Access control on inbound queries at the row/column level, PII masking