Get your data sorted.

sgr build votes.splitfile

Work with data
like you work with code.

Try it in five minutes
FROM splitgraph/uk_2017_ge:latest IMPORT {
  SELECT
    ons_code,
    SUM(CASE WHEN party_id = 'Conservative'
      THEN valid_votes ELSE 0 END)
      AS conservative_votes,
    SUM(CASE WHEN party_id = 'Labour'
      THEN valid_votes ELSE 0 END)
      AS labour_votes,
    SUM(valid_votes) AS total_votes
  FROM ward_results
  GROUP BY ons_code
} AS votes_by_party

SQL {
  CREATE TABLE london_votes AS SELECT
    lookup."PCON18NM" AS constituency,
    v.conservative_votes,
    v.labour_votes,
    v.total_votes,
    ST_Union(london.geom) AS geom
  FROM "splitgraph/london_wards:latest".city_merged_2018
    london
  JOIN "splitgraph/uk_wards".lookup_table lookup
    ON london.gss_code = lookup."WD18CD"
  JOIN votes_by_party v
    ON v.ons_code = lookup."PCON18CD"
  GROUP BY constituency,
    conservative_votes,
    labour_votes,
    total_votes
}

Explore Public Data

Explore over 40,000 datasets »

Build, combine and share data.

Powered by Postgres.
Inspired by Docker and Git.

 

Build composable datasets

Splitfiles allow you to use familiar SQL to build versioned datasets, or "data images," which are snapshots of a database similar to how a Docker image is a snapshot of a filesystem. Merging public data with internal datasets is as simple as referencing them through a JOIN.

Learn more about Splitfiles
 

Keep data fresh and reproducible

With Splitgraph's provenance tracking, you know exactly where your data came from. Keep data images up-to-date with a single command when the sources change. Easily integrate Splitgraph into your CI pipeline to stay on top of changes to your data sources.

Learn more about data provenance and rebuilding data images.
 

Share data with peers

Like Git, Splitgraph is peer-to-peer. Push data to any other Splitgraph instance or publish it to the catalog at Splitgraph Cloud, where you get bonus features like an instant, OpenAPI-compatible REST API for every version of your data.

Learn more about Splitgraph Cloud.
Try it in five minutesRead our introductory blog post »

 Built with Postgres

 

Keep your existing tools

Anything that works with Postgres will work with Splitgraph. As far as your tools are concerned, a Splitgraph image is just another Postgres database. You can adopt Splitgraph incrementally while keeping your existing workflows and benefitting from the Postgres ecosystem.

See examples of common integrations
 

Ingest data from anywhere

Forget ETL and BI connectors. Splitgraph leverages the native PostgreSQL feature of Foreign Data Wrappers. Use any FDW to import data from common databases, or thousands of open government datasets. Or, write a custom mount handler to import data from wherever you need.

Read more about ingesting data with FDWs
 

Save on costs

Stop paying for a bulky, always-on data warehouse. Splitgraph data can be stored in any S3-compatible object storage and downloaded on demand when it needs to be queried.

Read about layered querying

Enhance every stage of the data lifecycle

Adopt Splitgraph incrementally,
where and when you need it.

  •  
    Ingest data from anywhereImport data from all major databases, setup Splitgraph as a Postgres replication client, or write a custom mount handler to cover your unique use case. Transform the data into a Splitgraph image, or leave it as-is and query it on demand.Read the FDW Documentation
  •  
    Layered queryingDon't download the whole dataset just to run one SELECT. Splitgraph lets your software query remote data by lazily downloading only the required fragments.Learn about Layered Querying
  •  
    Instantly access thousands of open datasetsSplitgraph comes bundled with a mount handler for Socrata, an open data platform that hosts tens of thousands of government datasets. You can use Splitgraph to mount any Socrata dataset as a Postgres table. You can even write JOIN queries across different data portals.Explore the Socrata repositoryTry joining data from two Socrata data portals
  •  
    Delta CompressionSplitgraph tables are composed of delta compressed objects. Keep track of how your data changed through history at low storage cost and bring your datasets up to date without redownloading them.Learn how Splitgraph stores objects
  •  
    Content addressable chunksSplitgraph objects are immutable and content-addressable, allowing Splitgraph to automatically deduplicate data and store multiple versions efficiently. Focus on what to put into your data warehouse, not how to store it.See content addressability in action
  •  
    S3 Compatible Blob StorageSplitgraph stores your data as columnar chunks in any S3-compatible object store, and Postgres only needs to keep track of lightweight metadata until you're ready to query it. Download data only when you need it, without the need for a bulky always-on warehouse.Try an example of pushing to object storage
  •  
    Data VersioningSwitch between different versions of your data, capture changes, send and receive revisions and do it without rewriting any of your tools — just like Git.Discover how change tracking works
  •  
    Command Line ClientManage Splitgraph data using a familiar command line interface inspired by Docker and Git.Discover the sgr CLI
  • Python LibraryInteract with Splitgraph repositories and images using the full suite of Python data science tools, including Jupyter notebooks and Pandas DataFrames.Try an example Jupyter notebook
  •  
    Big-data ReadySplitgraph uses a columnar storage format for its data, offering a smaller (5x-10x) on-disk footprint and faster read performance than native PostgreSQL tables. Keep your data in S3-compatible storage, and only download it when you need it.See benchmarking data on GitHub
  •  
    SplitfilesDefine transformations on data using a declarative syntax that will be familiar to anyone who has written a Dockerfile. Enjoy full access to the SQL language, and reference other Splitgraph data images or foreign tables with a simple JOIN.Discover Splitfiles
  •  
    ProvenanceDatasets built with Splitfiles have all their sources recorded, meaning Splitgraph knows exactly where your data came from and when to rebuild it. Easily stay on top of your data, without drifting out of date when upstream data sources change.See an example of provenance in the catalogLearn more about provenance
  •  
    CachingRebuild data only if the sources have changed. Easily integrate Splitfiles into your CI pipeline to keep your data up to date and only download the changes to upstream datasets.See how Splitfiles can fit in your CI pipeline
  •  
    Extract, transform, transformSpend less time loading. Spend more time defining transformations between self-contained, immutable data images. Let Splitgraph worry about dependency graphs, so you can focus on results.Read about SQL layers in Splitfiles
  •  
    Peer-to-PeerAny Splitgraph engine can act as a remote peer. Push and pull data between Splitgraph installations, or publish it to Splitgraph Cloud using the same protocol.Try a decentralized demo
  •  
    Auto-generated REST APIGet an instant, auto-generated OpenAPI-compatible REST API for every version of your data when you push to Splitgraph Cloud, thanks to the power of PostgREST. Query any version of your data with a simple HTTP request. More tools coming soon.Try the splitgraph/socrata REST API
  •  
    Access an ever-growing library of dataSplitgraph Cloud is like GitHub is to git, a place where you can share and find public data. Enrich your private reports by joining your internal data with public data. Give back to the community by sharing your data, whether it's a brand-new dataset, or a fresh take on public data.Explore the Splitgraph catalog
Read the DocsVisit Splitgraph at GitHub »

Why Splitgraph?

 

Work with data like code

Benefit from familiar conventions and concepts. We took the best ideas from our favorite development tools and applied them to the domain of data science.

Read our introductory blog post
 

Build composable data images

Write familiar SQL with simple JOINs to combine public and private datasets into versioned data images with provenance tracking.

Read more about Splitfiles
 

Treat your data like cattle

Stop treating your datasets like pets. Use Splitgraph to gain confidence about where data in your warehouse came from and how to rebuild it from scratch.

Read about our philosophy

Data scientists

spend 80% of their time

cleaning and preparing data

Time to try something new?Explore public data »