Port 5432 is open...
Query 40k+ datasets with SQL
-- Join across two tables at different government data portals (Chicago and Cambridge) -- Splitgraph will rewrite the queries into the providers' query language, get the data -- and run the JOIN, returning the results over the PostgreSQL protocol. SELECT cambridge_cases.date AS date, chicago_cases.cases_total AS chicago_daily_cases, cambridge_cases.new_positive_cases AS cambridge_daily_cases FROM "cityofchicago/covid19-daily-cases-deaths-and-hospitalizations-naz8-j4nc".covid19_daily_cases_deaths_and_hospitalizations chicago_cases FULL OUTER JOIN "cambridgema-gov/covid19-case-count-by-date-axxk-jvk8".covid19_case_count_by_date cambridge_cases ON date_trunc('day', chicago_cases.lab_report_date) = cambridge_cases.date::timestamp ORDER BY date ASC;
Connect to the
Data Delivery Network
with any PostgreSQL client.
What is Splitgraph?
Splitgraph is an integrated
data catalog and database proxy.
The Splitgraph catalog indexes 40k+ data sources, including both live databases and versioned data snapshots called "data images." Discover data and explore it with features like an auto-generated REST API, schema documentation, and provenance tracking.Explore the Catalog
Connect to the Data Delivery Network (DDN) to query the catalog like it's a Postgres database. The DDN is a distributed SQL caching proxy built on the PostgreSQL wire protocol. It can route queries to any data in the catalog, whether that's a live database or a specific version of a data image.Connect Now
Build & Share Data
Build versioned datasets from your own data, package them and push them to the Splitgraph catalog for other people to discover and query. Store the data as column oriented, delta-compressed objects in an S3-compatible object store. Push the metadata to Splitgraph peers, like Splitgraph.com.Learn more about Splitgraph Cloud.
Built Around an Open Core
Splitgraph.com is a hosted service built around Splitgraph Core.
It adds features like a public SQL proxy and data catalog.
Discover Data in the Catalog
We index 40k+ public datasetsExplore over 40,000 datasets »
& make them queryable with SQL.
Build Reproducible Data Snapshots
Combine data sources into reproducible data "images"
using a CI-friendly build process.
FROM demo/weather IMPORT rdu AS source_data SQL CREATE TABLE monthly_summary AS ( \ SELECT to_char(date, 'YYYYMM') AS month, \ AVG(precipitation) AS average_precipitation, \ AVG(snowfall) AS average_snowfall \ FROM source_data \ GROUP BY month \ ORDER BY month ASC)
- SplitfilesDefine transformations on data using a declarative syntax that will be familiar to anyone who has written a Dockerfile. Enjoy full access to the SQL language, and reference other Splitgraph data images or foreign tables with a simple JOIN.Discover Splitfiles
- ProvenanceDatasets built with Splitfiles have all their sources recorded, meaning Splitgraph knows exactly where your data came from and when to rebuild it. Easily stay on top of your data, without drifting out of date when upstream data sources change.See an example of provenance in the catalogLearn more about provenance
- CachingRebuild data only if the sources have changed. Easily integrate Splitfiles into your CI pipeline to keep your data up to date and only download the changes to upstream datasets.See how Splitfiles can fit in your CI pipeline
- Data VersioningSwitch between different versions of your data, capture changes, send and receive revisions and do it without rewriting any of your tools — just like Git.Discover how change tracking works
Push Data to Splitgraph
Push images to Splitgraph using an
immutable and content-addressable
- Peer-to-PeerAny Splitgraph engine can act as a remote peer. Push and pull data between Splitgraph installations, or publish it to Splitgraph Cloud using the same protocol.Try a decentralized demo
- Auto-generated REST APIGet an instant, auto-generated OpenAPI-compatible REST API for every version of your data when you push to Splitgraph Cloud, thanks to the power of PostgREST. Query any version of your data with a simple HTTP request. More tools coming soon.Try the splitgraph/socrata REST API
- S3 Compatible Blob StorageSplitgraph stores your data as columnar chunks in any S3-compatible object store, and Postgres only needs to keep track of lightweight metadata until you're ready to query it. Download data only when you need it, without the need for a bulky always-on warehouse.Try an example of pushing to object storage
$ sgr push votes_by_state Pushing votes_by_state to splitgraph-demo/votes_by_state on remote data.splitgraph.com Gathering remote metadata... No objects to upload. Uploaded metadata for 2 images, 1 table, 0 objects and 0 tags. Setting upstream for votes_by_state to splitgraph-demo/votes_by_state.
Store & Query EfficientlySave on storage costs with a columnar format and delta compression.
- Delta CompressionSplitgraph tables are composed of delta compressed objects. Keep track of how your data changed through history at low storage cost and bring your datasets up to date without redownloading them.Learn how Splitgraph stores objects
- Content addressable chunksSplitgraph objects are immutable and content-addressable, allowing Splitgraph to automatically deduplicate data and store multiple versions efficiently. Focus on what to put into your data warehouse, not how to store it.See content addressability in action
- Layered queryingDon't download the whole dataset just to run one SELECT. Splitgraph lets your software query remote data by lazily downloading only the required fragments.Learn about Layered Querying
Want Splitgraph for your business?Contact Us
We're developing a "Private Cloud" product.Read About the Beta »
Want in on the beta? Get in touch.
Run Splitgraph Locally
Run a local Splitgraph Engine
on top of Postgres
to mount or clone data into tables.
Powered by Postgres
Plug into a growing ecosystem.
- Ingest data from anywhereSplitgraph "mounting" is built on Postgres Foreign Data Wrappers (FDW). You can "mount" and import data from all major databases. You can setup Splitgraph as a Postgres replication client. Or you can write a custom mount handler to cover your unique use case. Transform the data into a Splitgraph image, or leave it as-is and query it on demand.Read the FDW Documentation
- Keep Your Existing ToolsAnything that works with Postgres will work with Splitgraph. As far as your tools are concerned, a Splitgraph image is just another Postgres database. You can adopt Splitgraph incrementally while keeping your existing workflows and benefitting from the Postgres ecosystem.See examples of common integrations