splitgraph.engine package

Module contents

Defines the interface for a Splitgraph engine (a backing database), including running basic SQL commands, tracking tables for changes and uploading/downloading tables to other remote engines.

By default, Splitgraph is backed by Postgres: see splitgraph.engine.postgres for an example of how to implement a different engine.

class splitgraph.engine.ChangeEngine

Bases: splitgraph.engine.SQLEngine, abc.ABC

An SQL engine that can perform change tracking on a set of tables.

discard_pending_changes(schema, table=None)

Discard recorded pending changes for a tracked table or the whole schema

get_change_key(schema, table)

Returns the key used to identify a row in a change (list of column name, column type). If the tracked table has a PK, we use that; if it doesn’t, the whole row is used.

get_changed_tables(schema)

List tracked tables that have pending changes

Parameters:schema – Schema to check for changes
Returns:List of tables with changed contents
get_pending_changes(schema, table, aggregate=False)

Return pending changes for a given tracked table

Parameters:
  • schema – Schema the table belongs to
  • table – Table to return changes for
  • aggregate – Whether to aggregate changes or return them completely
Returns:

If aggregate is True: tuple with numbers of (added_rows, removed_rows, updated_rows). If aggregate is False: A changeset. The changeset is a list of (pk, action (0 for Insert, 1 for Delete, 2 for Update), action_data) where action_data is None for Delete and {‘c’: [column_names], ‘v’: [column_values]} that have been inserted/updated otherwise.

get_tracked_tables()
Returns:A list of (table_schema, table_name) that the engine currently tracks for changes
has_pending_changes(schema)

Return True if the tracked schema has pending changes and False if it doesn’t.

track_tables(tables)

Start engine-specific change tracking on a list of tables.

Parameters:tables – List of (table_schema, table_name) to start tracking
untrack_tables(tables)

Stop engine-specific change tracking on a list of tables and delete any pending changes.

Parameters:tables – List of (table_schema, table_name) to start tracking
class splitgraph.engine.ObjectEngine

Bases: object

Routines for storing/applying objects as well as sharing them with other engines.

apply_fragments(objects, target_schema, target_table, extra_quals=None, extra_qual_args=None)

Apply multiple fragments to a target table as a single-query batch operation.

Parameters:
  • objects – List of tuples (object_schema, object_table) that the objects are stored in.
  • target_schema – Schema to apply the fragment to
  • target_table – Table to apply the fragment to
  • extra_quals – Optional, extra SQL (Composable) clauses to filter new rows in the fragment on (e.g. SQL(“a = %s”))
  • extra_qual_args – Optional, a tuple of arguments to use with extra_quals
download_objects(objects, remote_engine)

Download objects from the remote engine to the local cache

Parameters:
  • objects – List of object IDs to download
  • remote_engine – A remote ObjectEngine to download the objects from.

:return List of object IDs that were downloaded.

dump_object(schema, table, stream)

Dump a table to a stream using an engine-specific binary format.

Parameters:
  • schema – Schema the table lives in
  • table – Table to dump
  • stream – A file-like stream to dump the object into
load_object(schema, table, stream)

Load a table from a stream using an engine-specific binary format.

Parameters:
  • schema – Schema to create the table in. Must already exist.
  • table – Table to create. Must not exist.
  • stream – A file-like stream to load the object from
store_fragment(inserted, deleted, schema, table, source_schema, source_table)

Store a fragment of a changed table in another table

Parameters:
  • inserted – List of PKs that have been updated/inserted
  • deleted – List of PKs that have been deleted
  • schema – Schema to store the change in
  • table – Table to store the change in
  • source_schema – Schema the source table is located in
  • source_table – Name of the source table
upload_objects(objects, remote_engine)

Upload objects from the local cache to the remote engine

Parameters:
  • objects – List of object IDs to upload
  • remote_engine – A remote ObjectEngine to upload the objects to.
class splitgraph.engine.ResultShape

Bases: enum.Enum

Shape that the result of a query will be coerced to

MANY_MANY = 4
MANY_ONE = 3
NONE = 0
ONE_MANY = 2
ONE_ONE = 1
class splitgraph.engine.SQLEngine

Bases: abc.ABC

Abstraction for a Splitgraph SQL backend. Requires any overriding classes to implement run_sql as well as a few other functions. Together with the information_schema (part of the SQL standard), this class uses those functions to implement some basic database management methods like listing, deleting, creating, dumping and loading tables.

close()

Commit and close the engine’s backing connection

commit()

Commit the engine’s backing connection

copy_table(source_schema, source_table, target_schema, target_table, with_pk_constraints=True, limit=None, offset=None, order_by_pk=False)

Copy a table in the same engine, optionally applying primary key constraints as well.

create_schema(schema)

Create a schema if it doesn’t exist

create_table(schema, table, schema_spec)

Creates a table using a previously-dumped table schema spec

Parameters:
  • schema – Schema to create the table in
  • table – Table name to create
  • schema_spec – A list of (ordinal_position, column_name, data_type, is_pk) specifying the table schema
delete_schema(schema)

Delete a schema if it exists, including all the tables in it.

delete_table(schema, table)

Drop a table from a schema if it exists

dump_table_creation(schema, tables, created_schema=None)

Dumps the basic table schema (column names, data types, is_nullable) for one or more tables into SQL statements.

Parameters:
  • schema – Schema to dump tables from
  • tables – Tables to dump
  • created_schema – If not None, specifies the new schema that the tables will be created under.
Returns:

An SQL statement that reconstructs the schema for the given tables.

dump_table_sql(schema, table_name, stream, columns='*', where='', where_args=None)

Dump the table contents in the SQL format :param schema: Schema the table is located in :param table_name: Name of the table :param stream: A file-like object to write the result into. :param columns: SQL column spec. Default ‘*’. :param where: Optional, an SQL WHERE clause :param where_args: Arguments for the optional WHERE clause.

get_all_tables(schema)

Get all tables in a given schema.

get_column_names(schema, table_name)

Returns a list of all columns in a given table.

get_column_names_types(schema, table_name)

Returns a list of (column, type) in a given table.

get_full_table_schema(schema, table_name)

Generates a list of (column ordinal, name, data type, is_pk), used to detect schema changes like columns being dropped/added/renamed or type changes.

get_primary_keys(schema, table)

Get a list of (column_name, column_type) denoting the primary keys of a given table.

get_table_size(schema, table)

Return the table disk usage, in bytes.

get_table_type(schema, table)

Get the type of the table (BASE or FOREIGN)

initialize()

Does any required initialization of the engine

lock_table(schema, table)

Acquire an exclusive lock on a given table, released when the transaction commits / rolls back.

rollback()

Rollback the engine’s backing connection

run_sql(statement, arguments=None, return_shape=<ResultShape.MANY_MANY: 4>)

Run an arbitrary SQL statement with some arguments, return an iterator of results. If the statement doesn’t return any results, return None.

run_sql_batch(statement, arguments, schema=None)

Run a parameterized SQL statement against multiple sets of arguments. Other engines can override if they support a more efficient batching mechanism.

run_sql_in(schema, sql, arguments=None, return_shape=<ResultShape.MANY_MANY: 4>)

Executes a non-schema-qualified query against a specific schema.

Parameters:
  • schema – Schema to run the query in
  • sql – Query
  • arguments – Query arguments
  • return_shape – ReturnShape to coerce the result into.
schema_exists(schema)

Check if a schema exists on the engine.

Parameters:schema – Schema name
table_exists(schema, table_name)

Check if a table exists on the engine.

Parameters:
  • schema – Schema name
  • table_name – Table name
splitgraph.engine.get_engine(name=None, use_socket=False)

Get the current global engine or a named remote engine

Parameters:
  • name – Name of the remote engine as specified in the config. If None, the current global engine is returned.
  • use_socket – Use a local UNIX socket instead of PG_HOST, PG_PORT for LOCAL engine connections.
splitgraph.engine.get_remote_connection_params(remote_name)

Get connection parameters for a Splitgraph remote.

Parameters:remote_name – Name of the remote. Must be specified in the config file.
Returns:A tuple of (hostname, port, username, password, database)
splitgraph.engine.switch_engine(engine)

Switch the global engine to a different one. The engine will get switched back on exit from the context manager.

Parameters:engine – Name of the engine or an SQLEngine instance