Working with Splitgraph
Your application will mostly interact with Splitgraph by running SQL queries on data that you add or public data.
Here's a sample Splitgraph query:
SELECT COUNT(*) FROM "splitgraph/socrata:20200809".datasets
Splitgraph organizes data in collections of tables called repositories. In
splitgraph/socrata is the repository we're querying. Repository
names have two parts:
- Namespace, in this case
splitgraph(this is similar to a GitHub/Docker organization)
- Repository, in this case
Splitgraph repositories can be versioned or live.
A live repository acts as a "proxy" to a remote database. When you query a live repository, Splitgraph translates the inbound query to the remote database's query language and forwards it.
A versioned repository consists of multiple versions, or images. Each image is stored in a columnar format, inspired by modern cloud data warehouses like Snowflake.
splitgraph/socrata repository is versioned. In the example query,
we're querying a certain human-readable tag (
20200809) that Splitgraph
attached to the image to denote its version.
If you omit the version, Splitgraph will use the
latest version of the
dataset. These are equivalent:
SELECT COUNT(*) FROM "splitgraph/socrata".datasets
SELECT COUNT(*) FROM "splitgraph/socrata:latest".datasets
If you're familiar with PostgreSQL, it might help to treat repositories as
schemas (in fact,
"splitgraph/socrata" is a schema in the above query).
You can attach metadata like READMEs or topics to Splitgraph repositories to make them discoverable by other people. You can also make a repository private and control who can access it.
There are multiple ways to add data to Splitgraph:
- Uploading a CSV file from the Web or
- Setting up one of the over 100 SaaS sources or live queries to popular databases
- Writing to the Splitgraph DDN
- Pushing a data image from
Splitgraph can also run dbt for you on a schedule or on-demand, offering a simple way to transform repositories.
Once your dataset is published, you can add metadata
like topics or a README file to make it easier for data consumers to discover.
You can also use the
to programmatically manage your repositories.
Finally, you can manage who can access or edit a given repository using Splitgraph's sharing options.
Splitgraph allows you to query data using a variety of methods: