Metabase
You can connect a Metabase instance to a Splitgraph engine, allowing you to visualize datasets that were cloned and checked out on the engine. Metabase also works with layered querying, letting it query huge datasets by only downloading a required fraction of the data.
Make sure to use Metabase version v0.36.8 or above. Earler versions have an issue with the Metabase Explore Data view: it doesn't support schemata with slashes in them that Splitgraph uses by default to check data out into.
Start Metabase
Here is a sample Docker Compose service definition for Metabase running against a Splitgraph engine:
version: "3"
services:
engine:
image: splitgraph/engine:${DOCKER_TAG-stable-postgis}
ports:
- 5432:5432
environment:
- POSTGRES_USER=sgr
- POSTGRES_PASSWORD=supersecure
- POSTGRES_DB=splitgraph
- SG_LOGLEVEL=INFO
- SG_CONFIG_FILE=/.sgconfig
expose:
- 5432
volumes:
- ${HOME}/.splitgraph/.sgconfig:/.sgconfig
metabase:
image: metabase/metabase
ports:
- 3000:3000
You can also use a Splitgraph engine managed by sgr engine
: you will have to connect it to the Docker network set up by Metabase. For example:
$ docker network connect metabase_default splitgraph_engine_default
Configure connection
Go to localhost:3000
and set up your Metabase account.
Use engine
as the database hostname if you're using the Compose file or splitgraph_engine_default
(or the Docker container name) if you're using sgr engine
.
Browse Data issue
As discussed, the "Browse Data" functionality currently doesn't work on Splitgraph, since Splitgraph schemas contain slashes in them.
Write SQL
Click on "Write SQL" and select the Splitgraph database that you set up.
Sample datasets and queries
This sections contains some sample datasets that are hosted on Splitgraph and are ready for visualization in Metabase.
You can use layered querying to query these datasets, which
lazily downloads required fragments on the fly. For maximum performance, you can check the image
out into a real PostgreSQL table and create indexes on it (use sgr checkout
instead of sgr checkout --layered
)
Tweets of Congress
Tweets of Congress image on Splitgraph
$ sgr clone splitgraph/congress_tweets
$ sgr checkout --layered splitgraph/congress_tweets:latest
SELECT DATE(time) AS date,
COUNT(1) AS tweets
FROM "splitgraph/congress_tweets".tweets
WHERE text ILIKE '%coronavirus%'
AND time > '2020-01-01'
GROUP BY date;
WITH hourly_counts AS (
SELECT DATE(time) AS date,
EXTRACT(HOUR FROM time) AS hour,
COUNT(1) AS tweets
FROM "splitgraph/congress_tweets".tweets
GROUP BY date, hour
) SELECT hour, AVG(tweets)
FROM hourly_counts
GROUP BY hour;
Domestic US Flights
Domestic US Flights image on Splitgraph
$ sgr clone splitgraph/domestic_us_flights
$ sgr checkout --layered splitgraph/domestic_us_flights:latest
SELECT
origin_airport,
SUM(passengers) AS total_passengers
FROM "splitgraph/domestic_us_flights".flights
WHERE fly_month BETWEEN '2008-01-01' AND '2009-01-01'
GROUP BY origin_airport
ORDER BY total_passengers DESC;
2016 US Election votes
2016 US Election image on Splitgraph
$ sgr clone splitgraph/2016_election
$ sgr checkout --layered splitgraph/2016_election:latest
SELECT
state_postal,
SUM (CASE WHEN candidate_normalized = 'trump' THEN votes ELSE 0 END) / SUM(votes) * 100 AS trump_vote_percentage
FROM "splitgraph/2016_election".precinct_results
GROUP BY state_postal;
Geonames
$ sgr clone splitgraph/geonames
$ sgr checkout --layered splitgraph/geonames:latest
SELECT
name, latitude, longitude, elevation
FROM "splitgraph/geonames".all_countries
WHERE feature_code = 'PPL' AND elevation IS NOT NULL
ORDER BY elevation DESC
LIMIT 100;
SELECT
name, latitude, longitude, country_code
FROM "splitgraph/geonames".all_countries
WHERE feature_code = 'PPL' AND name = 'Cambridge'
London Wards
London Wards image on Splitgraph
This dataset is PostGIS-enabled. Whilst Metabase doesn't support PostGIS columns directly, you can convert them to latitude and longitude and show them using Metabase's Pin visualization.
$ sgr engine upgrade --image splitgraph/engine:stable-postgis
$ sgr sql "CREATE EXTENSION IF NOT EXISTS postgis"
$ sgr clone splitgraph/london_wards
$ sgr checkout --layered splitgraph/london_wards:latest
WITH ward_centres AS (
SELECT name, ST_Centroid(ST_Transform(ST_SetSRID(geom, 27700), 4326)) AS centroid
FROM "splitgraph/london_wards".city_merged_2018
)
SELECT name, ST_Y(centroid) AS lat, ST_X(centroid) AS lon
FROM ward_centres;