Use Cases

Version controlling your data

Splitgraph's Git-like version control capabilities allow you to see what changes were made to a dataset, or switch between different versions of a schema on the fly. Existing applications can get benefits of Splitgraph's version control without having to be changed at all.

The decentralized demo shows the basics of running Git operations on data.

Our Jupyter/scikit-learn demo shows using Splitgraph to switch between training and validation versions of a machine learning dataset, whereas the dbt example uses Splitgraph to run dbt against two different sources and see their effects on the built dbt model.

Ingesting data from MongoDB, MySQL or other databases

Forget BI connectors or ETL jobs. With PostgreSQL foreign data wrappers, your applications can query other databases directly through Splitgraph, using a single protocol.

Since the Splitgraph engine is based on PostgreSQL, it can even be added as a PostgreSQL logical replication client and ingest data directly from your production database.

Querying and plotting Socrata data

Splitgraph has first-class support for querying datasets on the Socrata open data platform through SQL and using them in Splitfiles. We also maintain a catalog of over 40000 open government datasets available for immediate querying.

See the Socrata example for a demonstration of exploring a Socrata endpoint with DBeaver and plotting a dataset with Metabase.

Sharing and publishing datasets

Much like Git, Splitgraph is decentralized and any Splitgraph engine can act as a remote. Changes pushed to and pulled from other Splitgraph instances are delta compressed, letting you keep your dataset's history without sacrificing storage. See the decentralized demo for an introduction to using Splitgraph in a peer-to-peer fashion.

Splitgraph Cloud is the analog of the Docker registry for Docker: it's a repository of publicly available data images that you can experiment with.

See the five-minute demo that will take you through the basics of building and publishing a dataset on Splitgraph Cloud, complete with provenance tracking and an automatic OpenAPI-compatible REST API.

Building data images with Splitfiles

When Docker and Dockerfiles came along, they changed the way we view software builds and deployments. Splitfiles were inspired by the same ideas. Splitfiles let you build Splitgraph images by using standard SQL statements, with efficient rebuilds and provenance tracking.

See the five-minute demo that shows you how to build a dataset using a Splitfile and publish it on Splitgraph.

Combining existing data images to build new data

Data is most interesting when it's combined with other data. Splitfiles make it easy to compose multiple data images together using familiar SQL and referencing them through JOIN operators.

See the US Election Splitfile that joins several source data images in multiple stages or the PostGIS example that uses a Splitfile to build a geospatial dataset and plot it as a Choropleth map.

Querying large datasets with your existing tools

Splitgraph checks data out into normal PostgreSQL tables, offering read-write performance and feature parity with PostgreSQL. Layered querying expands on that and allows Splitgraph to lazily download required table regions on demand at query execution time, which lets any of your existing tools query huge remote datasets with limited local cache space.

Check our our five-minute demo for an introduction on using Splitgraph to query public data. The Metabase example demonstrates using Metabase with Splitgraph to easily access and plot timeseries, categorical and geospatial data.