Splitgraph has been acquired by EDB! Read the blog post.

Frequently Asked Questions

When should I use Splitgraph?

The main use case for Splitgraph is read-intensive Web applications that need to run analytical queries on large datasets.

For example, let's say you have an Observable notebook that needs to query a dataset that exceeds the maximum attachment size, or the dataset is located in a database that you can't access from a public notebook.

You can upload the dataset to Splitgraph or get Splitgraph to proxy to your database. Then, you can use Splitgraph's Observable connector to execute SQL queries over HTTP, delivering the results straight to your notebook.

Or, imagine you are building a Web application with a framework like Next.js. You want to render a dashboard with various data visualizations and aggregations. In this case, your application can query Splitgraph from the server side with a PostgreSQL library or directly from the client's browser. With its columnar storage and query result caching, Splitgraph will ensure your dashboard renders quickly.

Finally, another use case for Splitgraph is as a starter modern data stack (like we use it!). You can get Splitgraph to ingest data on a schedule from over 100 SaaS services using Airbyte, run dbt models, organize the results in a catalog for your data team and connect a BI tool like Metabase directly to Splitgraph to build dashboards.

When shouldn't I use Splitgraph?

You shouldn't use Splitgraph in the case of classical Web applications that utilize transactions or large volumes of single-row writes and reads.

While Splitgraph's query endpoint supports writes, it's mostly intended for bulk writes and CREATE TABLE AS statements executed by dbt.

If you need to satisfy this use case, we recommend a battle-tested OLTP database like PostgreSQL or MySQL, as well as novel serverless databases like Fauna or Cloudflare D1.

What does Splitgraph use for storage?

We use sgr, which is an open-source technology we originally developed for manipulating "data images".

sgr is based on PostgreSQL, which makes Splitgraph compatible with the PostgreSQL wire protocol and query syntax.

sgr, and by extension Splitgraph's, storage architecture is similar to that of Snowflake:

  • tables are partitioned into regions (sgr calls them "objects")
  • partitions can partially overlap each other, allowing for delta compression
  • each partition is stored in object storage as a column-oriented file (we use cstore_fdw, giving us compatibility with all PostgreSQL data types, including PostGIS)
  • compute nodes inspect inbound queries and download required partitions on demand to satisfy them (separating storage and compute).

If you want to learn more about how Splitgraph stores and queries data, you should read the advanced sgr documentation instead.

What's the story with Splitgraph and sgr?

Splitgraph and sgr are separate but interoperable products and we currently aim to maintain this interoperability.

In particular, you can install the sgr engine for more advanced sgr usage, including being able to "clone" a repository from Splitgraph to your local machine or "push" data to Splitgraph.

If you're only interested in Splitgraph, you can use sgr as a CLI to automate certain Splitgraph tasks.

Who's using Splitgraph?

Trase: Commodity Footprints dashboard

Transparency for Sustainable Economies, or Trase, is a supply chain transparency initiative that transforms our understanding of globally traded agricultural commodities. It empowers companies, governments and others to address sustainability risks and opportunities by linking supply chain actors to production landscapes across the world.

Trase publishes most of its core data to Splitgraph at https://www.splitgraph.com/trase. In addition, Trase uses Splitgraph to power its Commodity Footprints dashboard. There is also an Observable notebook with an introduction to querying Trase data on Splitgraph.

Ourselves: enterprise data platform

A private deployment of Splitgraph is at the core of our own data platform. We use it to run a mixture of Airbyte connectors to replicate data from SaaS services and our own production databases, run dbt and build a data model that we then query with Metabase. See our blog post for more details.

You?

If you are using Splitgraph and would like to share your use case, feel free to get in touch!