Inspect image metadata using the
$ sgr show namespace/repository:image_hash
Inspect tables in a given image using the
sgr table command:
$ sgr table namespace/repository:image_hash table_name
Inspect object metadata with the
sgr object command:
$ sgr object object_id
These commands require you to clone the repository (but not its actual data).
It's also possible to inspect the metadata on a remote repository without
cloning any images by passing the
--remote flag to these commands:
$ sgr show -r data.splitgraph.com splitgraph/geonames:latest Image splitgraph/geonames:0b77a102cbabe2de8597dcab71f1c333f16f9e1963c9c8982988e10e65989c62 Created at 2019-12-11T12:20:08.447974 Size: 539.95 MiB Parent: 0000000000000000000000000000000000000000000000000000000000000000 Tables: all_countries
Let's further inspect the object that the first example image,
$ sgr object o26c6d8345cba276f807d7bcf906531568f309c2609a3420d98c01a6c99b166 Object ID: o26c6d8345cba276f807d7bcf906531568f309c2609a3420d98c01a6c99b166 Namespace: example Format: FRAG Size: 963.00 B Created: 2020-04-06 10:17:15.650187 Rows inserted: 10 Insertion hash: 37be08d7c1aca256f5e64860afe7db21e6f7c47b00cdf23f65b84576bc209f41 Rows deleted: 0 Deletion hash: 0000000000000000000000000000000000000000000000000000000000000000 Column index: key: [0, 9] value: ['19581e27de7ced00ff1ce50b2047e7a567c76b1cbaebabe5ef03f7c3017bb5b7', 'ef2d127de37b942baad06145e54b0c619a1f22327b2ebbcfbec78f5564afe39d']
There are a few interesting things in this object's metadata. Firstly, it's
stored in the
FRAG format. A Splitgraph table consists of multiple fragments
that can partially overwrite each other.
Every fragment also has a column index: this is a metadata entry showing, for every column, the minimum and maximum values that this fragment affects (deletes or inserts). This is used when querying large remote Splitgraph tables to not download fragments that definitely don't match a given query. For qualifiers on sorted columns (like the primary key), this can mean only downloading one or two fragments to satisfy a query to a table composed of hundreds.
Finally, every fragment also has an insertion and a deletion hash: the insertion hash is the hash of all the rows that this fragment inserts and the deletion hash is the hash of all the rows that this fragment deletes. These hashes are homomorphic (every row is hashed individually and then the hashes are summed up) and are similar to Facebook's LtHash. They have a really important property: the sum of all hashes of individual fragments composing a table is equal to the content hash of a whole table. This is used in deduplicating data and optimizing storage.
sgr generates the deterministic object ID by combining the content hash
insertion - deletion hash) and the hash of the schema of a given object. This
means that Splitgraph fragments are content-addressable.
Now, let's look at the second object that the new version of the
demo table in
example/repo_2:new_data is linked to.
$ sgr object o4882c12cdc4cb5b4e89481ebdb71585998633f19a72652debbb80980ae0f0b Object ID: o4882c12cdc4cb5b4e89481ebdb71585998633f19a72652debbb80980ae0f0b Namespace: example Format: FRAG Size: 552.00 B Created: 2020-04-06 10:18:11.466667 Rows inserted: 4 Insertion hash: aa15011c4af8e7afb877b5de68ab782b69d63720ba5787bec54fbcd6838fb377 Rows deleted: 4 Deletion hash: 7463a64e15187558e6e1096368033dfab422b34503c20e94a83b7a3e4650df68 Column index: key: [0, 11] value: ['4a44dc15364204a80fe80e9039455cc1608281820fe2b24f1e5233ade6af1dd5', 'd4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35_UPDATED'] Location: created locally
As you can see, this new object is based on the object that was used by the very
first version of the data we generated
o4882c12cdc4cb5b4e89481ebdb71585998633f19a72652debbb80980ae0f0b). This is
because the demo data is static and so the first version of the
demo table in
example/repo_2 is in fact the same.
Since this object updates and deletes some rows, their values are included in its deletion hash, which is now non-zero.
Finally, the column index now spans all the values of the
key column that were
replaced or added. Hence, when there's a query against this table that is inside
that range, both fragments will be fetched to check if they match that query.
We can also inspect the actual object: every object in the
sgr cache is stored
as a CStore columnar storage file.
$ sgr sql "SELECT * FROM splitgraph_meta.o4882c12cdc4cb5b4e89481ebdb71585998633f19a72652debbb80980ae0f0b" 2 d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35_UPDATED True 3 4e07408562bedb8b60ce05c1decfe3ad16b72230967de01f640b7e4729b49fce_UPDATED True 10 4a44dc15364204a80fe80e9039455cc1608281820fe2b24f1e5233ade6af1dd5 True 11 4fc82b26aecb47d2868c4efbe3581732a3e7cbcc6c2efb32062c08170a05eeb8 True 0 None False 1 None False
sgr stores the object with the same schema as the original table, plus a
Boolean flag showing whether a row has been updated/inserted (upserted) or
deleted. For deleted rows,
sgr only records the primary key, and pads the rest
of the table with
sgr does not always store objects in the engine. When you push an image to an
external location, like Splitgraph,
sgr usually uploads them to an
S3-compatible storage location.
sgr only downloads the actual objects and
loads them into Postgres when you check out an image containing the objects.