Metabase

You can connect a Metabase instance to a Splitgraph engine, allowing you to visualize datasets that were cloned and checked out on the engine. Metabase also works with layered querying, letting it query huge datasets by only downloading a required fraction of the data.

However, there is currently an issue with Metabase Explore Data view: it doesn't support schemata with slashes in them that Splitgraph uses by default to check data out into. There's an open GitHub issue for this.

Start Metabase

Here is a sample Docker Compose service definition for Metabase running against a Splitgraph engine:

version: "3"
services:
  engine:
    image: splitgraph/engine:${DOCKER_TAG-stable-postgis}
    ports:
      - 5432:5432
    environment:
      - POSTGRES_USER=sgr
      - POSTGRES_PASSWORD=supersecure
      - POSTGRES_DB=splitgraph
      - SG_LOGLEVEL=INFO
      - SG_CONFIG_FILE=/.sgconfig
    expose:
      - 5432
    volumes:
      - ${HOME}/.splitgraph/.sgconfig:/.sgconfig
  metabase:
    image: metabase/metabase
    ports:
      - 3000:3000

You can also use a Splitgraph engine managed by sgr engine: you will have to connect it to the Docker network set up by Metabase. For example:

$ docker network connect metabase_default splitgraph_engine_default

Configure connection

Go to localhost:3000 and set up your Metabase account.

Set up Metabase account

Use engine as the database hostname if you're using the Compose file or splitgraph_engine_default (or the Docker container name) if you're using sgr engine.

Set up PostgreSQL connection

Browse Data issue

As discussed, the "Browse Data" functionality currently doesn't work on Splitgraph, since Splitgraph schemas contain slashes in them.

Browse Data dialog

Write SQL

Click on "Write SQL" and select the Splitgraph database that you set up.

Click on Write SQL

Select Splitgraph database

Sample datasets and queries

This sections contains some sample datasets that are hosted on Splitgraph and are ready for visualization in Metabase.

You can use layered querying to query these datasets, which lazily downloads required fragments on the fly. For maximum performance, you can check the image out into a real PostgreSQL table and create indexes on it (use sgr checkout instead of sgr checkout --layered)

Tweets of Congress

Tweets of Congress image on Splitgraph

$ sgr clone splitgraph/congress_tweets
$ sgr checkout --layered splitgraph/congress_tweets:latest
SELECT DATE(time) AS date,
    COUNT(1) AS tweets
FROM "splitgraph/congress_tweets".tweets
WHERE text ILIKE '%coronavirus%'
    AND time > '2020-01-01'
GROUP BY date;

Congress Tweets query example

WITH hourly_counts AS (
    SELECT DATE(time) AS date,
    EXTRACT(HOUR FROM time) AS hour,
    COUNT(1) AS tweets
    FROM "splitgraph/congress_tweets".tweets
    GROUP BY date, hour
) SELECT hour, AVG(tweets)
    FROM hourly_counts
    GROUP BY hour;

Congress Tweets query example

Domestic US Flights

Domestic US Flights image on Splitgraph

$ sgr clone splitgraph/domestic_us_flights
$ sgr checkout --layered splitgraph/domestic_us_flights:latest
SELECT
    origin_airport,
    SUM(passengers) AS total_passengers
FROM "splitgraph/domestic_us_flights".flights
WHERE fly_month BETWEEN '2008-01-01' AND '2009-01-01'
GROUP BY origin_airport
ORDER BY total_passengers DESC;

Domestic US flights query example

2016 US Election votes

2016 US Election image on Splitgraph

$ sgr clone splitgraph/2016_election
$ sgr checkout --layered splitgraph/2016_election:latest
SELECT
    state_postal,
    SUM (CASE WHEN candidate_normalized = 'trump' THEN votes ELSE 0 END) / SUM(votes) * 100 AS trump_vote_percentage
FROM "splitgraph/2016_election".precinct_results
GROUP BY state_postal;

2016 US Election query example

Geonames

Geonames image on Splitgraph

$ sgr clone splitgraph/geonames
$ sgr checkout --layered splitgraph/geonames:latest
SELECT
    name, latitude, longitude, elevation
FROM "splitgraph/geonames".all_countries
WHERE feature_code = 'PPL' AND elevation IS NOT NULL
ORDER BY elevation DESC
LIMIT 100;

Geonames query example

SELECT
    name, latitude, longitude, country_code
FROM "splitgraph/geonames".all_countries
WHERE feature_code = 'PPL' AND name = 'Cambridge'

Geonames query example

London Wards

London Wards image on Splitgraph

This dataset is PostGIS-enabled. Whilst Metabase doesn't support PostGIS columns directly, you can convert them to latitude and longitude and show them using Metabase's Pin visualization.

$ sgr engine upgrade --image splitgraph/engine:stable-postgis
$ sgr sql "CREATE EXTENSION IF NOT EXISTS postgis"
$ sgr clone splitgraph/london_wards
$ sgr checkout --layered splitgraph/london_wards:latest
WITH ward_centres AS (
    SELECT name, ST_Centroid(ST_Transform(ST_SetSRID(geom, 27700), 4326)) AS centroid
    FROM "splitgraph/london_wards".city_merged_2018
)
SELECT name, ST_Y(centroid) AS lat, ST_X(centroid) AS lon
FROM ward_centres;

London Wards Election query example