Splitgraph has been acquired by EDB! Read the blog post.

dbt

NOTE: This is an sgr-specific article. To learn how to use Splitgraph with dbt, see the corresponding Splitgraph page.

The recommended way of building Splitgraph data images is Splitfiles that offer Dockerfile-like caching, provenance tracking and efficient rebuilds.

However, there are plenty of other great tools for building datasets and, as long as they work with PostgreSQL, they too can benefit from sgr's data versioning, packaging and sharing capabilities.

One such tool is dbt that assembles data transformations from small building blocks, decreasing the amount of boilerplate. In a sense, dbt can be used as an advanced SQL templating engine.

Turning the source and the target schemas that dbt uses into Splitgraph repositories opens up a lot of opportunities:

  • No need to run development and production dbt models against the same warehouse. A Splitgraph image can be cloned and checked out on the development engine.
  • Performing what-if analyses becomes simple by switching between different versions of the source dataset and comparing the resultant images with sgr diff.
  • Built datasets can be pushed to other sgr engines, shared publicly or serve as inputs to a pipeline of Splitfiles.
  • Input datasets can leverage sgr's layered querying, allowing dbt to seamlessly query huge datasets with a limited amount of local disk space.
  • Input datasets can be backed by foreign data wrappers, allowing dbt to directly use data from a wide variety of databases without having to write an extra ETL job to load the data into the warehouse.

sgr dbt adapter

You don't need any extra plugins to use dbt with sgr, since you can use dbt's native PostgreSQL support to query the sgr engine. However, if you install the sgr dbt adapter, you will be able to reference sgr images directly from your dbt code. For example:

{{ config(materialized='table') }}

with source_data as (

    select domain, count(domain) as count
    from "splitgraph/socrata:latest".datasets
    group by domain

)

select *
from source_data

See our GitHub page for instructions on how to install and use the sgr dbt adapter as well as a sample dbt project.

Example

The dbt example showcases running dbt against the sgr engine, using sgr to swap between different versions of the source dataset and looking at their effect on the built dbt model.

Table of contents