Working with Splitgraph

A sample Splitgraph query

Your application will mostly interact with Splitgraph by running SQL queries on data that you add or public data.

Here's a sample Splitgraph query:

SELECT COUNT(*) FROM "splitgraph/socrata:20200809".datasets

Splitgraph organizes data in collections of tables called repositories. In this case, splitgraph/socrata is the repository we're querying. Repository names have two parts:

  • Namespace, in this case splitgraph (this is similar to a GitHub/Docker organization)
  • Repository, in this case socrata

Splitgraph repositories can be versioned or live.

A live repository acts as a "proxy" to a remote database. When you query a live repository, Splitgraph translates the inbound query to the remote database's query language and forwards it.

A versioned repository consists of multiple versions, or images. Each image is stored in a columnar format, inspired by modern cloud data warehouses like Snowflake.

The above splitgraph/socrata repository is versioned. In the example query, we're querying a certain human-readable tag (20200809) that Splitgraph attached to the image to denote its version.

If you omit the version, Splitgraph will use the latest version of the dataset. These are equivalent:

SELECT COUNT(*) FROM "splitgraph/socrata".datasets
SELECT COUNT(*) FROM "splitgraph/socrata:latest".datasets

If you're familiar with PostgreSQL, it might help to treat repositories as schemas (in fact, "splitgraph/socrata" is a schema in the above query).

Discovering data

You can attach metadata like READMEs or topics to Splitgraph repositories to make them discoverable by other people. You can also make a repository private and control who can access it.

You can use Splitgraph's data catalog to search for repositories, or add your own.

Adding data

There are multiple ways to add data to Splitgraph:

Splitgraph can also run dbt for you on a schedule or on-demand, offering a simple way to transform repositories.

Once your dataset is published, you can add metadata like topics or a README file to make it easier for data consumers to discover. You can also use the splitgraph.yml format to programmatically manage your repositories.

Finally, you can manage who can access or edit a given repository using Splitgraph's sharing options.

Consuming data

Splitgraph allows you to query data using a variety of methods: