Storage

Storage format

Behind the scenes, Splitgraph tables are stored as multiple content-addressable, immutable chunks (we also use terms "object" or "fragment").

Splitgraph uses cstore_fdw as its storage backend. cstore_fdw is a columnar store for PostgreSQL that allows for superior read performance and low IO load. Data stored in cstore_fdw can take up to 5x less space than equivalent PostgreSQL tables.

Delta compression

There's no limitations to how a Splitgraph table is partitioned: it can be represented by one object, multiple disjoint objects (responsible for different regions of the table) or multiple objects that overlap each other. This allows Splitgraph to store just the changes that were made to a table, letting you keep all of your data's history at low storage cost.

Delta compression is optional: if sensitive data made its way into the history, you can always rechunk your image to delete it.

Content addressability

Every Splitgraph object is immutable and content-addressable: the object's ID identifies is contents and objects can't change once they have been created. This lets Splitgraph quickly determine what needs to be downloaded to bring a dataset up to date and optimize storage by deduplicating data.

S3 storage

When Splitgraph images are pushed out to other instances, the objects containing the data itself can be uploaded to an S3-compatible storage provider like MinIO.

This lets the remote Splitgraph engine to act as a lightweight metadata store, allowing for cheaper data warehousing: unused objects can be stored in object storage and only download to the engine on demand when they need to be queried.

See the object storage example for more information.