Frequently Asked Questions
No. While Splitgraph is a
letting you push and pull data between it and your local
sgr instance, a lot
of its functionality doesn't require you to download
Not quite. The
sgr engine ships as a
Docker image and is a customized version of PostgreSQL that is fully compatible
with existing clients. In the future, we might repackage
sgr as a PostgreSQL
While it is possible to add
sgr to existing PostgreSQL deployments, there
isn't currently a simple installation method. If you're interested in doing so,
you can follow the instructions in the
used to build the engine or contact us.
You can also add the
sgr engine as a PostgreSQL
logical replication client, which will
let you ingest data from existing databases without installing
sgr on them.
With mounting, you can query data in other databases
(including MongoDB, MySQL, PostgreSQL or Elasticsearch) directly through
any PostgreSQL client. You do not
need to copy your data into PostgreSQL to use
We maintain a couple of Jupyter notebooks with benchmarks on our GitHub.
It's difficult to specify what is considered a benchmark for
sgr, as for a lot
of operations one would be benchmarking PostgreSQL itself. This is why we
haven't run benchmarks like TPC-DS on
for maximum performance, it's easy to check out a Splitgraph image into a
PostgreSQL schema) but have tested the overhead of various
sgr workloads over
- Committing and checking out Splitgraph images takes slightly less time than
writing the same data to PostgreSQL tables (
sgrmoves data directly between PostgreSQL tables without query parsing overhead)
- Writing to PostgreSQL tables that are change-tracked by
sgris almost 2x slower than writing to untracked tables (
sgruses audit triggers to record changes rather than diffing the table at commit time).
- Splitgraph images take up much less (5x-10x) space than equivalent PostgreSQL
tables due to it using
- Querying Splitgraph images directly without checkout (layered querying) can sometimes be faster and use less IO than querying PostgreSQL tables.
sgr has a few optimizations that make it suitable for working with large
- Datasets are partitioned into fragments stored in a columnar format which is superior to row-format storage for OLAP workloads.
- You can query Splitgraph images without checking them out or even downloading
them completely. With layered querying,
sgrcan lazily download a small fraction of the table needed for the query. This is still completely seamless to the client application.
sgr is built on top of PostgreSQL, you can use the same methods for
horizontally scaling a PostgreSQL deployment to scale a