Clone vs checkout
You can clone a Splitgraph repository using the
sgr clone command:
$ sgr clone some_namespace/some_repository
This will look for this repository in all currently registered remotes in
clone the repository from the first remote that contains it.
By default, cloning a Splitgraph repository is different than what happens with Git repositories. In this case, Splitgraph only clones the repository's metadata, which is lightweight compared to the actual data but still lets the user get an overview of the repository.
For example, consider the 2016 US Presidential Election precinct-level returns dataset:
# This only transfers a few KB of metadata. $ sgr clone splitgraph/2016_election Gathering remote metadata... Fetched metadata for 1 image, 1 table, 20 objects and 1 tag. # How big is the actual image? $ sgr show splitgraph/2016_election:latest Image splitgraph/2016_election:3835145ada3f07cad99087d1b1071122d58c48783cbfe4694c101d35651fba90 Created at 2019-10-10T15:51:41.122370 Size: 26.75 MiB No parent (root image) Tables: precinct_results
Splitgraph tries to be as lazy as possible and only download the actual data when a query or a
checkout requires it. You can override this behavior by passing
On checkout of an image, Splitgraph gathers all objects required by that image, downloads them and assembles them into tables in a process called "materialization".
$ sgr checkout splitgraph/2016_election:latest Need to download 20 objects (26.75 MiB), cache occupancy: 492.69 MiB/10.00 GiB Fetching 20 objects, total size 26.75 MiB Getting download URLs from registry PostgresEngine data.splitgraph.com (5a87...@data.splitgraph.com:5432/sgregistry)... 100%|███████████| 20/20 [00:12<00:00, 1.66obj/s, object=o1ccf32547...] Checked out splitgraph/2016_election:3835145ada3f.