Research

Splitgraph lets you work on data with your existing tools whilst also introducing concepts and best practices from the software engineering discipline.

Data Versioning

Splitgraph was partially inspired by Git and implements some of Git's most useful functions. You can check out, commit, tag, push and pull Splitgraph datasets just like you would with Git.

Splitgraph's versioning is implemented on top of the SQL standard. Any existing tool can interact with a Splitgraph table and benefit from its change tracking capabilities, for example, dbt, Jupyter or Metabase

The decentralized demo shows the basics of running Git operations on data.

sgr CLI

The sgr command line client is the easiest way to manipulate data images and manage your Splitgraph engine. It ships as a single binary, freeing you from needing to set up a working Python environment, and takes care of running the Splitgraph engine in Docker with sgr engine commands.

Python Library

Splitgraph is partially written in Python, letting it directly integrate with Python's vast data science ecosystem. You can manipulate Splitgraph images and repositories directly from your Python code or Jupyter notebook and export Splitgraph data directly to Pandas DataFrames, including using layered querying.

See the Jupyter/scikit-learn demo for a showcase.