Welcome to sgr
sgr is the open-source component that's at the core of Splitgraph. It's a
tool that allows the user to manipulate data images
(snapshots of SQL tables at a given point in time) as if they were code
repositories by versioning, pushing and pulling them.
sgr works on top of PostgreSQL and uses SQL for all versioning and internal
operations. You can "check out" data into actual PostgreSQL tables, offering
read/write performance and feature parity with PostgreSQL and allowing you to
query it with any SQL client. The client application has no idea that it's
talking to a
sgr table and you don't need to rewrite any of your tools to use
sgr. Anything that works with PostgreSQL will work with
Building data with Splitfiles
sgr also defines the declarative Splitfile language
with Dockerfile-like caching semantics that allows you to build Splitgraph
repositories in a composable, maintainable and reproducible way. When you build
data with Splitfiles, you get
provenance tracking. You can
inspect an image's metadata to find the exact upstream images, tables and
columns that went into it. With one command,
sgr can use this provenance data
to rebuild an image against a newer version of its upstream dependencies. You
can easily integrate
sgr into your existing CI pipelines, to keep your data
up-to-date and stay on top of changes to its inputs.
You do not need to download the full Splitgraph image to query it. Instead, you can query Splitgraph images with layered querying, which will download only the regions of the table relevant to your query, using bloom filters and other metadata. This is useful when you're exploring large datasets from your laptop, or when you're only interested in a subset of data from an image. This is still completely transparent to the client application, which sees a PostgreSQL schema that it can talk to using the Postgres wire protocol.
Adding data to
sgr does not limit your data sources to Postgres databases. It includes
first-class support for importing and querying data from other databases using
foreign data wrappers.
You can create Splitgraph repositories from or query data in
other Postgres databases,
using the same interface.
Decentralized data sharing
sgr is peer-to-peer. You can push and pull data images between other
sgr installations and use it as a standalone tool to supercharge your data
workflows. Splitgraph is also an
letting you publish your datasets and make them easily queryable by your Web