splitgraph.core package

Module contents

Core Splitgraph functionality: versioning and sharing tables.

The main point of interaction with the Splitgraph API is a splitgraph.core.repository.Repository object representing a local or a remote Splitgraph repository. Repositories can be created using one of the following methods:

  • Directly by invoking Repository(namespace, name, engine) where engine is the engine that the repository belongs to (that can be gotten with get_engine(engine_name). If the created repository doesn’t actually exist on the engine, it must first be initialized with repository.init().

  • By using splitgraph.core.engine.lookup_repository() which will search for the repository on the current lookup path.

Submodules

splitgraph.core.engine module

Routines for managing Splitgraph engines, including looking up repositories and managing objects.

splitgraph.core.engine.get_current_repositories(engine: PostgresEngine) → List[Tuple[Repository, Optional[Image]]]

Lists all repositories currently in the engine.

Parameters

engine – Engine

Returns

List of (Repository object, current HEAD image)

splitgraph.core.engine.init_engine(skip_object_handling: bool = False) → None

Initializes the engine by:

  • performing any required engine-custom initialization

  • creating the metadata tables

Parameters

skip_object_handling – If True, skips installing routines related to object handling and checkouts (like audit triggers and CStore management).

splitgraph.core.engine.lookup_repository(name: str, include_local: bool = False) → Repository

Queries the SG engines on the lookup path to locate one hosting the given repository.

Parameters
  • name – Repository name

  • include_local – If True, also queries the local engine

Returns

Local or remote Repository object

splitgraph.core.engine.repository_exists(repository: Repository) → bool

Checks if a repository exists on the engine.

Parameters

repository – Repository object

splitgraph.core.image module

Image representation and provenance

class splitgraph.core.image.Image(image_hash: str, parent_id: Optional[str], created: datetime.datetime, comment: str, provenance_data: List[Dict[str, Union[str, List[str], List[bool], List[Dict[str, str]]]]], repository: Repository)

Bases: tuple

Represents a Splitgraph image. Should’t be created directly, use Image-loading methods in thesplitgraph.core.repository.Repository class instead.

checkout(*args, **kwargs) → None

Checks the image out, changing the current HEAD pointer. Raises an error if there are pending changes to its checkout.

Parameters
  • force – Discards all pending changes to the schema.

  • layered – If True, uses layered querying to check out the image (doesn’t materialize tables inside of it).

property comment

Alias for field number 3

property created

Alias for field number 2

delete_tag(tag: str) → None

Deletes a tag from an image.

Parameters

tag – Tag to delete.

property engine
get_log() → List[splitgraph.core.image.Image]

Repeatedly gets the parent of a given image until it reaches the bottom.

get_parent_children() → Tuple[Optional[str], List[str]]

Gets the parent and a list of children of a given image.

get_size() → int

Get the physical size used by the image’s objects (including those that might be shared with other images).

This is calculated from the metadata, the on-disk footprint might be smaller if not all of image’s objects have been downloaded.

Returns

Size of the image in bytes.

get_table(table_name: str)splitgraph.core.table.Table

Returns a Table object representing a version of a given table. Contains a list of objects that the table is linked to and the table’s schema.

Parameters

table_name – Name of the table

Returns

Table object

get_tables() → List[str]

Gets the names of all tables inside of an image.

get_tags()

Lists all tags that this image has.

property image_hash

Alias for field number 0

property object_engine
property parent_id

Alias for field number 1

provenance(reverse=False, engine=None) → List[Tuple[Repository, str]]

Inspects the image’s parent chain to come up with a set of repositories and their hashes that it was created from.

If reverse is True, returns a list of images that were created _from_ this image. If this image is on a remote repository, engine can be passed in to override the engine used for the lookup of dependents.

Returns

List of (repository, image_hash)

property provenance_data

Alias for field number 4

query_schema(wrapper: Optional[str] = 'splitgraph.core.fdw_checkout.QueryingForeignDataWrapper') → Iterator[str]

Creates a temporary schema with tables in this image mounted as foreign tables that can be accessed via read-only layered querying. On exit from the context manager, the schema is discarded.

Returns

The name of the schema the image is located in.

property repository

Alias for field number 5

set_provenance(provenance_data: List[Dict[str, Union[str, List[str], List[bool], List[Dict[str, str]]]]]) → None

Sets the image’s provenance. Internal function called by the Splitfile interpreter, shouldn’t be called directly as it changes the image after it’s been created.

Parameters

provenance_data – List of parsed Splitfile commands and their data.

tag(tag: str) → None

Tags a given image. All tags are unique inside of a repository. If a tag already exists, it’s removed from the previous image and given to the new image.

Parameters

tag – Tag to set. ‘latest’ and ‘HEAD’ are reserved tags.

to_splitfile(ignore_irreproducible: bool = False, source_replacement: Optional[Dict[Repository, str]] = None) → List[str]

Recreate the Splitfile that can be used to reconstruct this image.

Parameters
  • ignore_irreproducible – If True, ignore commands from irreproducible Splitfile lines (like MOUNT or custom commands) and instead emit a comment (this results in an invalid Splitfile).

  • source_replacement – A dictionary of repositories and image hashes/tags specifying how to replace the dependencies of this Splitfile (table imports and FROM commands).

Returns

A list of Splitfile commands that can be fed back into the executor.

splitgraph.core.image.getrandbits(k) → x. Generates an int with k random bits.
splitgraph.core.image.reconstruct_splitfile(provenance_data: List[Dict[str, Union[str, List[str], List[bool], List[Dict[str, str]]]]], ignore_irreproducible: bool = False, source_replacement: Optional[Dict[Repository, str]] = None) → List[str]

Recreate the Splitfile that can be used to reconstruct an image.

splitgraph.core.object_manager module

Functions related to creating, deleting and keeping track of physical Splitgraph objects.

class splitgraph.core.object_manager.ObjectManager(object_engine: PostgresEngine, metadata_engine: Optional[PostgresEngine] = None)

Bases: splitgraph.core.fragment_manager.FragmentManager

Brings the multiple manager classes together and manages the object cache (downloading and uploading objects as required in order to fulfill certain queries)

cleanup() → List[str]

Deletes all objects in the object_tree not required by any current repository, including their dependencies and their remote locations. Also deletes all objects not registered in the object_tree.

download_objects(source: Optional[ObjectManager], objects_to_fetch: List[str], object_locations: List[Tuple[str, str, str]]) → List[str]

Fetches the required objects from the remote and stores them locally. Does nothing for objects that already exist.

Parameters
  • source – Remote ObjectManager. If None, will only try to download objects from the external location.

  • objects_to_fetch – List of object IDs to download.

  • object_locations – List of custom object locations, encoded as tuples (object_id, object_url, protocol).

ensure_objects(table: Optional[Table], objects: Optional[List[str]] = None, quals: Optional[Sequence[Sequence[Tuple[str, str, Any]]]] = None, defer_release: bool = False, tracer: Optional[splitgraph.core.common.Tracer] = None, upstream_manager: Optional[ObjectManager] = None) → Iterator[Union[List[str], Tuple[List[str], splitgraph.core.common.CallbackList]]]

Resolves the objects needed to materialize a given table and makes sure they are in the local splitgraph_meta schema.

Whilst inside this manager, the objects are guaranteed to exist. On exit from it, the objects are marked as unneeded and can be garbage collected.

Parameters
  • table – Table to materialize

  • objects – List of objects to download: one of table or objects must be specified.

  • quals – Optional list of qualifiers to be passed to the fragment engine. Fragments that definitely do not match these qualifiers will be dropped. See the docstring for filter_fragments for the format.

  • defer_release – If True, won’t release the objects on exit.

Returns

If defer_release is True: List of table fragments and a callback that the caller must call when the objects are no longer needed. If defer_release is False: just the list of table fragments.

get_cache_occupancy() → int
Returns

Space occupied by objects cached from external locations, in bytes.

get_downloaded_objects(limit_to: Optional[List[str]] = None) → List[str]

Gets a list of objects currently in the Splitgraph cache (i.e. not only existing externally.)

Parameters

limit_to – If specified, only the objects in this list will be returned.

Returns

Set of object IDs.

get_total_object_size()
Returns

Space occupied by all objects on the engine, in bytes.

make_objects_external(objects: List[str], handler: str, handler_params: Dict[Any, Any]) → None

Uploads local objects to an external location and marks them as being cached locally (thus making it possible to evict or swap them out).

Parameters
  • objects – Object IDs to upload. Will do nothing for objects that already exist externally.

  • handler – Object handler

  • handler_params – Extra handler parameters

run_eviction(keep_objects: List[str], required_space: Optional[int] = None) → None

Delete enough objects with zero reference count (only those, since we guarantee that whilst refcount is >0, the object stays alive) to free at least required_space in the cache.

Parameters
  • keep_objects – List of objects (besides those with nonzero refcount) that can’t be deleted.

  • required_space – Space, in bytes, to free. If the routine can’t free at least this much space, it shall raise an exception. If None, removes all eligible objects.

upload_objects(target: splitgraph.core.object_manager.ObjectManager, objects_to_push: List[str], handler: str = 'DB', handler_params: Optional[Dict[Any, Any]] = None) → Sequence[Tuple[str, Optional[str]]]

Uploads physical objects to the remote or some other external location.

Parameters
  • target – Target ObjectManager

  • objects_to_push – List of object IDs to upload.

  • handler – Name of the handler to use to upload objects. Use DB to push them to the remote, FILEto store them in a directory that can be accessed from the client and HTTP to upload them to HTTP.

  • handler_params – For HTTP, a dictionary {“username”: username, “password”, password}. For FILE, a dictionary {“path”: path} specifying the directory where the objects shall be saved.

Returns

A list of (object_id, url) that specifies all objects were uploaded (skipping objects that already exist on the remote).

splitgraph.core.registry module

Functions for communicating with the remote Splitgraph catalog

splitgraph.core.registry.get_info_key(engine: PostgresEngine, key: str) → Optional[str]

Gets a configuration key from the remote registry, used to notify the client of the registry’s capabilities.

Parameters
  • engine – Engine

  • key – Key to get

splitgraph.core.registry.set_info_key(engine: PostgresEngine, key: str, value: Union[bool, str]) → None

Sets a configuration value on the remote registry.

Parameters
  • engine – Engine

  • key – Key to set

  • value – New value for the key

splitgraph.core.registry.setup_registry_mode(engine: PostgresEngine) → None

Set up access policies/RLS:

  • Normal users aren’t allowed to create tables/schemata (can’t do checkouts inside of a registry or upload SG objects directly to it)

  • Normal users can’t access the splitgraph_meta schema directly: they’re only supposed to be able to talk to it via stored procedures in splitgraph_api. Those procedures are set up with SECURITY INVOKER (run with those users’ credentials) and what they can access is further restricted by RLS:

    • images/tables/tags meta tables: can only create/update/delete records where the namespace = user ID

    • objects/object_location tables: same. An object (piece of data) becomes owned by the user that creates it and still remains so even if someone else’s image starts using it. Hence, the original owner can delete or change it (since they control the external location they’ve uploaded it to anyway).

splitgraph.core.repository module

Public API for managing images in a Splitgraph repository.

class splitgraph.core.repository.Repository(namespace: str, repository: str, engine: Optional[splitgraph.engine.postgres.engine.PostgresEngine] = None, object_engine: Optional[splitgraph.engine.postgres.engine.PostgresEngine] = None, object_manager: Optional[splitgraph.core.object_manager.ObjectManager] = None)

Bases: object

Splitgraph repository API

commit(image_hash: Optional[str] = None, comment: Optional[str] = None, snap_only: bool = False, chunk_size: Optional[int] = None, split_changeset: bool = False, extra_indexes: Optional[Dict[str, Dict[str, Union[List[str], Dict[str, Dict[str, Any]]]]]] = None, in_fragment_order: Optional[Dict[str, List[str]]] = None, overwrite: bool = False)splitgraph.core.image.Image

Commits all pending changes to a given repository, creating a new image.

Parameters
  • image_hash – Hash of the commit. Chosen by random if unspecified.

  • comment – Optional comment to add to the commit.

  • snap_only – If True, will store the table as a full snapshot instead of delta compression

  • chunk_size – For tables that are stored as snapshots (new tables and where snap_only has been passed, the table will be split into fragments of this many rows.

  • split_changeset – If True, splits the changeset into multiple fragments based on the PK regions spanned by the current table fragments. For example, if the original table consists of 2 fragments, first spanning rows 1-10000, second spanning rows 10001-20000 and the change alters rows 1, 10001 and inserts a row with PK 20001, this will record the change as 3 fragments: one inheriting from the first original fragment, one inheriting from the second and a brand new fragment. This increases the number of fragments in total but means that fewer rows will need to be scanned to satisfy a query. If False, the changeset will be stored as a single fragment inheriting from the last fragment in the table.

  • extra_indexes – Dictionary of {table: index_type: column: index_specific_kwargs}.

  • in_fragment_order – Dictionary of {table: list of columns}. If specified, will

sort the data inside each chunk by this/these key(s) for each table. :param overwrite: If an object already exists, will force recreate it.

Returns

The newly created Image object.

commit_engines() → None

Commit the underlying transactions on both engines that the repository uses.

delete(unregister: bool = True, uncheckout: bool = True) → None

Discards all changes to a given repository and optionally all of its history, as well as deleting the Postgres schema that it might be checked out into. Doesn’t delete any cached physical objects.

After performing this operation, this object becomes invalid and must be discarded, unless init() is called again.

Parameters
  • unregister – Whether to purge repository history/metadata

  • uncheckout – Whether to delete the actual checked out repo. This has no effect if the repository is backed by a registry (rather than a local engine).

diff(table_name: str, image_1: Union[splitgraph.core.image.Image, str], image_2: Optional[Union[splitgraph.core.image.Image, str]], aggregate: bool = False) → Optional[Union[bool, Tuple[int, int, int], List[Tuple[bool, Tuple]]]]

Compares the state of a table in different images by materializing both tables into a temporary space and comparing them row-to-row.

Parameters
  • table_name – Name of the table.

  • image_1 – First image hash / object. If None, uses the state of the current staging area.

  • image_2 – Second image hash / object. If None, uses the state of the current staging area.

  • aggregate – If True, returns a tuple of integers denoting added, removed and updated rows between the two images.

Returns

If the table doesn’t exist in one of the images, returns True if it was added and False if it was removed. If aggregate is True, returns the aggregation of changes as specified before. Otherwise, returns a list of changes where each change is a tuple of(True for added, False for removed, row contents).

dump(stream: _io.TextIOWrapper, exclude_object_contents: bool = False) → None

Creates an SQL dump with the metadata required for the repository and all of its objects.

Parameters
  • stream – Stream to dump the data into.

  • exclude_object_contents – Only dump the metadata but not the actual object contents.

classmethod from_schema(schema: str)splitgraph.core.repository.Repository

Convert a Postgres schema name of the format namespace/repository to a Splitgraph repository object.

classmethod from_template(template: splitgraph.core.repository.Repository, namespace: Optional[str] = None, repository: Optional[str] = None, engine: Optional[splitgraph.engine.postgres.engine.PostgresEngine] = None, object_engine: Optional[splitgraph.engine.postgres.engine.PostgresEngine] = None)splitgraph.core.repository.Repository

Create a Repository from an existing one replacing some of its attributes.

get_all_hashes_tags() → List[Tuple[Optional[str], str]]

Gets all tagged images and their hashes in a given repository.

Returns

List of (image_hash, tag)

get_local_size() → int

Get the actual size used by this repository’s downloaded objects.

This might still be double-counted if the repository shares objects with other repositores.

Returns

Size of the repository in bytes.

get_size() → int

Get the physical size used by the repository’s data, counting objects that are used by multiple images only once. This is calculated from the metadata, the on-disk footprint might be smaller if not all of repository’s objects have been downloaded.

Returns

Size of the repository in bytes.

has_pending_changes() → bool

Detects if the repository has any pending changes (schema changes, table additions/deletions, content changes).

property head

Return the HEAD image for the repository or None if the repository isn’t checked out.

property head_strict

Return the HEAD image for the repository. Raise an exception if the repository isn’t checked out.

import_tables(*args, **kwargs) → str

Creates a new commit in target_repository with one or more tables linked to already-existing tables. After this operation, the HEAD of the target repository moves to the new commit and the new tables are materialized.

Parameters
  • tables – If not empty, must be the list of the same length as source_tables specifying names to store them under in the target repository.

  • source_repository – Repository to import tables from.

  • source_tables – List of tables to import. If empty, imports all tables.

  • image_hash – Image hash in the source repository to import tables from. Uses the current source HEAD by default.

  • foreign_tables – If True, copies all source tables to create a series of new snapshots instead of treating them as Splitgraph-versioned tables. This is useful for adding brand new tables (for example, from an FDW-mounted table).

  • do_checkout – If False, doesn’t check out the newly created image.

  • target_hash – Hash of the new image that tables is recorded under. If None, gets chosen at random.

  • table_queries – If not [], it’s treated as a Boolean mask showing which entries in the tables list are instead SELECT SQL queries that form the target table. The queries have to be non-schema qualified and work only against tables in the source repository. Each target table created is the result of the respective SQL query. This is committed as a new snapshot.

  • parent_hash – If not None, must be the hash of the image to base the new image on. Existing tables from the parent image are preserved in the new image. If None, the current repository HEAD is used.

  • wrapper – Override the default class for the layered querying foreign data wrapper.

  • skip_validation – Don’t validate SQL used in import statements (used by the Splitfile executor that pre-formats the SQL).

Returns

Hash that the new image was stored under.

init(*args, **kwargs) → None

Initializes an empty repo with an initial commit (hash 0000…)

materialized_table(table_name: str, image_hash: Optional[str]) → Iterator[Tuple[str, str]]

A context manager that returns a pointer to a read-only materialized table in a given image. The table is deleted on exit from the context manager.

Parameters
  • table_name – Name of the table

  • image_hash – Image hash to materialize

Returns

(schema, table_name) where the materialized table is located.

pull(download_all: Optional[bool] = False, overwrite_objects: bool = False, overwrite_tags: bool = False, single_image: Optional[str] = None) → None

Synchronizes the state of the local Splitgraph repository with its upstream, optionally downloading all new objects created on the remote.

Parameters
  • download_all – If True, downloads all objects and stores them locally. Otherwise, will only download required objects when a table is checked out.

  • overwrite_objects – If True, will overwrite object metadata on the local repository for existing objects.

  • overwrite_tags – If True, will overwrite existing tags.

  • single_image – Limit the download to a single image hash/tag.

push(remote_repository: Optional[Repository] = None, overwrite_objects: bool = False, reupload_objects: bool = False, overwrite_tags: bool = False, handler: str = 'DB', handler_options: Optional[Dict[str, Any]] = None, single_image: Optional[str] = None)splitgraph.core.repository.Repository

Inverse of pull: Pushes all local changes to the remote and uploads new objects.

Parameters
  • remote_repository – Remote repository to push changes to. If not specified, the current upstream is used.

  • handler – Name of the handler to use to upload objects. Use DB to push them to the remote or S3to store them in an S3 bucket.

  • overwrite_objects – If True, will overwrite object metadata on the remote repository for existing objects.

  • reupload_objects – If True, will reupload objects for which metadata is uploaded.

  • overwrite_tags – If True, will overwrite existing tags on the remote repository.

  • handler_options – Extra options to pass to the handler. For example, seesplitgraph.hooks.s3.S3ExternalObjectHandler.

  • single_image – Limit the upload to a single image hash/tag.

rollback_engines() → None

Rollback the underlying transactions on both engines that the repository uses.

run_sql(sql: Union[psycopg2.sql.Composed, str], arguments: Optional[Any] = None, return_shape: splitgraph.engine.ResultShape = <ResultShape.MANY_MANY: 4>) → Any

Execute an arbitrary SQL statement inside of this repository’s checked out schema.

set_tags(tags: Dict[str, Optional[str]]) → None

Sets tags for multiple images.

Parameters

tags – List of (image_hash, tag)

to_schema() → str

Returns the engine schema that this repository gets checked out into.

uncheckout(*args, **kwargs) → None

Deletes the schema that the repository is checked out into

Parameters

force – Discards all pending changes to the schema.

property upstream

The remote upstream repository that this local repository tracks.

splitgraph.core.repository.clone(remote_repository: Union[Repository, str], local_repository: Optional[Repository] = None, overwrite_objects: bool = False, overwrite_tags: bool = False, download_all: Optional[bool] = False, single_image: Optional[str] = None)splitgraph.core.repository.Repository

Clones a remote Splitgraph repository or synchronizes remote changes with the local ones.

If the target repository has no set upstream engine, the source repository becomes its upstream.

Parameters
  • remote_repository – Remote Repository object to clone or the repository’s name. If a name is passed, the repository will be looked up on the current lookup path in order to find the engine the repository belongs to.

  • local_repository – Local repository to clone into. If None, uses the same name as the remote.

  • download_all – If True, downloads all objects and stores them locally. Otherwise, will only download required objects when a table is checked out.

  • overwrite_objects – If True, will overwrite object metadata on the local repository for existing objects.

  • overwrite_tags – If True, will overwrite existing tags.

  • single_image – If set, only get a single image with this hash/tag from the source.

Returns

A locally cloned Repository object.

splitgraph.core.repository.getrandbits(k) → x. Generates an int with k random bits.
splitgraph.core.repository.import_table_from_remote(remote_repository: splitgraph.core.repository.Repository, remote_tables: List[str], remote_image_hash: str, target_repository: splitgraph.core.repository.Repository, target_tables: List[Any], target_hash: str = None) → None

Shorthand for importing one or more tables from a yet-uncloned remote. Here, the remote image hash is required, as otherwise we aren’t necessarily able to determine what the remote head is.

Parameters
  • remote_repository – Remote Repository object

  • remote_tables – List of remote tables to import

  • remote_image_hash – Image hash to import the tables from

  • target_repository – Target repository to import the tables to

  • target_tables – Target table aliases

  • target_hash – Hash of the image that’s created with the import. Default random.

splitgraph.core.repository.table_exists_at(repository: splitgraph.core.repository.Repository, table_name: str, image: Optional[splitgraph.core.image.Image] = None) → bool

Determines whether a given table exists in a Splitgraph image without checking it out. If image_hash is None, determines whether the table exists in the current staging area.

splitgraph.core.table module

Table metadata-related classes.

class splitgraph.core.table.QueryPlan(table: splitgraph.core.table.Table, quals: Optional[Sequence[Sequence[Tuple[str, str, Any]]]], columns: Sequence[str])

Bases: object

Represents the initial query plan (fragments to query) for given columns and qualifiers.

class splitgraph.core.table.Table(repository: Repository, image: Image, table_name: str, table_schema: List[splitgraph.core.types.TableColumn], objects: List[str])

Bases: object

Represents a Splitgraph table in a given image. Shouldn’t be created directly, use Table-loading methods in the splitgraph.core.image.Image class instead.

get_length() → int

Get the number of rows in this table.

This might be smaller than the total number of rows in all objects belonging to this table as some objects might overwrite each other.

Returns

Number of rows in table

get_query_plan(quals: Optional[Sequence[Sequence[Tuple[str, str, Any]]]], columns: Sequence[str], use_cache: bool = True)splitgraph.core.table.QueryPlan

Start planning a query (preliminary steps before object downloading, like qualifier filtering).

Parameters
  • quals – Qualifiers in CNF form

  • columns – List of columns

  • use_cache – If True, will fetch the plan from the cache for the same qualifiers and columns.

Returns

QueryPlan

get_size() → int

Get the physical size used by the table’s objects (including those shared with other tables).

This is calculated from the metadata, the on-disk footprint might be smaller if not all of table’s objects have been downloaded.

Returns

Size of the table in bytes.

materialize(destination: str, destination_schema: Optional[str] = None, lq_server: Optional[str] = None) → None

Materializes a Splitgraph table in the target schema as a normal Postgres table, potentially downloading all required objects and using them to reconstruct the table.

Parameters
  • destination – Name of the destination table.

  • destination_schema – Name of the destination schema.

  • lq_server – If set, sets up a layered querying FDW for the table instead using this foreign server.

query(columns: List[str], quals: Sequence[Sequence[Tuple[str, str, Any]]])

Run a read-only query against this table without materializing it.

This is a wrapper around query_lazy() that force evaluates the results which might mean more fragments being materialized that aren’t needed.

Parameters
  • columns – List of columns from this table to fetch

  • quals – List of qualifiers in conjunctive normal form. See the documentation for FragmentManager.filter_fragments for the actual format.

Returns

List of dictionaries of results

query_indirect(columns: List[str], quals: Optional[Sequence[Sequence[Tuple[str, str, Any]]]]) → Tuple[Iterator[bytes], Callable, splitgraph.core.table.QueryPlan]

Run a read-only query against this table without materializing it. Instead of actual results, this returns a generator of SQL queries that the caller can use to get the results as well as a callback that the caller has to run after they’re done consuming the results.

In particular, the query generator will prefer returning direct queries to Splitgraph objects and only when those are exhausted will it start materializing delta-compressed fragments.

This is an advanced method: you probably want to call table.query().

Parameters
  • columns – List of columns from this table to fetch

  • quals – List of qualifiers in conjunctive normal form. See the documentation for FragmentManager.filter_fragments for the actual format.

Returns

Generator of queries (bytes), a callback and a query plan object (containing stats that are fully populated after the callback has been called to end the query).

query_lazy(columns: List[str], quals: Sequence[Sequence[Tuple[str, str, Any]]]) → Iterator[Iterator[Dict[str, Any]]]

Run a read-only query against this table without materializing it.

Parameters
  • columns – List of columns from this table to fetch

  • quals – List of qualifiers in conjunctive normal form. See the documentation for FragmentManager.filter_fragments for the actual format.

Returns

Generator of dictionaries of results.

reindex(extra_indexes: Dict[str, Union[List[str], Dict[str, Dict[str, Any]]]], raise_on_patch_objects=True) → List[str]

Run extra indexes on all objects in this table and update their metadata. This only works on objects that don’t have any deletions or upserts (have a deletion hash of 000000…).

Parameters
  • extra_indexes – Dictionary of &lbrace;index_type: column: index_specific_kwargs&rbrace;.

  • raise_on_patch_objects – If True, will raise an exception if any objects in the table overwrite any other objects. If False, will log a warning but will reindex all non-patch objects.

:returns List of objects that were reindexed.

splitgraph.core.table.create_foreign_table(schema: str, server: str, table_name: str, schema_spec: List[splitgraph.core.types.TableColumn], internal_table_name: Optional[str] = None, extra_options: Optional[Dict[str, str]] = None)
splitgraph.core.table.merge_index_data(current_index: Dict[str, Any], new_index: Dict[str, Any])