Splitgraph has been acquired by EDB! Read the blog post.

Welcome to sgr

sgr is the open-source component that's at the core of Splitgraph. It's a tool that allows the user to manipulate data images (snapshots of SQL tables at a given point in time) as if they were code repositories by versioning, pushing and pulling them.

PostgreSQL compatibility

sgr works on top of PostgreSQL and uses SQL for all versioning and internal operations. You can "check out" data into actual PostgreSQL tables, offering read/write performance and feature parity with PostgreSQL and allowing you to query it with any SQL client. The client application has no idea that it's talking to a sgr table and you don't need to rewrite any of your tools to use sgr. Anything that works with PostgreSQL will work with sgr.

Building data with Splitfiles

sgr also defines the declarative Splitfile language with Dockerfile-like caching semantics that allows you to build Splitgraph repositories in a composable, maintainable and reproducible way. When you build data with Splitfiles, you get provenance tracking. You can inspect an image's metadata to find the exact upstream images, tables and columns that went into it. With one command, sgr can use this provenance data to rebuild an image against a newer version of its upstream dependencies. You can easily integrate sgr into your existing CI pipelines, to keep your data up-to-date and stay on top of changes to its inputs.

Layered querying

You do not need to download the full Splitgraph image to query it. Instead, you can query Splitgraph images with layered querying, which will download only the regions of the table relevant to your query, using bloom filters and other metadata. This is useful when you're exploring large datasets from your laptop, or when you're only interested in a subset of data from an image. This is still completely transparent to the client application, which sees a PostgreSQL schema that it can talk to using the Postgres wire protocol.

Adding data to sgr

sgr does not limit your data sources to Postgres databases. It includes first-class support for importing and querying data from other databases using Postgres foreign data wrappers. You can create Splitgraph repositories from or query data in MongoDB, MySQL, CSV files, other Postgres databases, Elasticsearch clusters or Snowflake warehouses using the same interface.

Decentralized data sharing

Finally, sgr is peer-to-peer. You can push and pull data images between other sgr installations and use it as a standalone tool to supercharge your data workflows. Splitgraph is also an sgr peer, letting you publish your datasets and make them easily queryable by your Web applications.