splitgraph.core package
Subpackages
Submodules
splitgraph.core.common module
Common internal functions used by Splitgraph commands.
- class splitgraph.core.common.CallbackList(iterable=(), /)
- Bases: - list- Used to pass around and call multiple callbacks at once. 
- class splitgraph.core.common.Tracer
- Bases: - object- Accumulates events and returns the times between them. - get_durations() List[Tuple[str, float]]
- Return all events and durations between them. :return: List of (event name, time to this event from the previous event (or start)) 
 - get_total_time() float
- Returns
- Time from start to the final logged event. 
 
 - log(event: str) None
- Log an event at the current time :param event: Event name 
 
- splitgraph.core.common.adapt(value: Any, pg_type: str) Any
- Coerces a value with a PG type into its Python equivalent. - Parameters
- value – Value 
- pg_type – Postgres datatype 
 
- Returns
- Coerced value. 
 
- splitgraph.core.common.aggregate_changes(query_result: List[Tuple[int, int]], initial: Optional[Tuple[int, int, int]] = None) Tuple[int, int, int]
- Add a changeset to the aggregated diff result 
- splitgraph.core.common.coerce_val_to_json(val: Any) Any
- Turn a Python value to a string/float that can be stored as JSON. 
- splitgraph.core.common.ensure_metadata_schema(engine: PsycopgEngine) None
- Create or migrate the metadata schema splitgraph_meta that stores the hash tree of schema snapshots (images), tags and tables. This means we can’t mount anything under the schema splitgraph_meta – much like we can’t have a folder “.git” under Git version control… 
- splitgraph.core.common.gather_sync_metadata(target: Repository, source: Repository, overwrite_objects=False, overwrite_tags=False, single_image: Optional[str] = None) Any
- Inspects two Splitgraph repositories and gathers metadata that is required to bring target up to date with source. - Parameters
- target – Target Repository object 
- source – Source repository object 
- overwrite_objects – If True, will return metadata for all objects belonging to new images (or existing image if single_image=True) 
- single_image – If set, only grab a single image with this hash/tag from the source. 
- overwrite_tags – If True and single_image is set, will return all tags for that image. If single_image is not set, will return all tags in the source repository. If False, will only return tags in the source that don’t exist on the target. 
 
- Returns
- Tuple of metadata for new_images, new_tables, object_locations, object_meta, tags 
 
- splitgraph.core.common.get_temporary_table_id() str
- Generate a random ID for temporary/staging objects that haven’t had their ID calculated yet. 
- splitgraph.core.common.getrandbits(k) x. Generates an int with k random bits.
- splitgraph.core.common.manage_audit(func: Callable) Callable
- A decorator to be put around various Splitgraph commands that adds/removes audit triggers for new/committed/deleted tables. 
- splitgraph.core.common.manage_audit_triggers(engine: PostgresEngine, object_engine: Optional[PostgresEngine] = None) None
- Does bookkeeping on audit triggers / audit table: - Detect tables that are being audited that don’t need to be any more (e.g. they’ve been unmounted) 
- Drop audit triggers for those and delete all audit info for them 
- Set up audit triggers for new tables 
 - If the metadata engine isn’t the same as the object engine, this does nothing. - Parameters
- engine – Metadata engine with information about images and their checkout state 
- object_engine – Object engine where the checked-out table and the audit triggers are located. 
 
 
- splitgraph.core.common.set_head(repository: Repository, image: Optional[str]) None
- Sets the HEAD pointer of a given repository to a given image. Shouldn’t be used directly. 
- splitgraph.core.common.set_tag(repository: Repository, image_hash: Optional[str], tag: str) None
- Internal function – add a tag to an image. 
- splitgraph.core.common.set_tags_batch(repository: Repository, hashes_tags: List[Tuple[str, str]]) None
- splitgraph.core.common.slow_diff(repository: Repository, table_name: str, image_1: Optional[str], image_2: Optional[str], aggregate: bool) Union[Tuple[int, int, int], List[Tuple[bool, Tuple]]]
- Materialize both tables and manually diff them 
- splitgraph.core.common.unmount_schema(engine: PsycopgEngine, schema: str) None
splitgraph.core.engine module
Routines for managing Splitgraph engines, including looking up repositories and managing objects.
- splitgraph.core.engine.get_current_repositories(engine: PostgresEngine) List[Tuple[Repository, Optional[Image]]]
- Lists all repositories currently in the engine. - Parameters
- engine – Engine 
- Returns
- List of (Repository object, current HEAD image) 
 
- splitgraph.core.engine.init_engine(skip_object_handling: bool = False) None
- Initializes the engine by: - performing any required engine-custom initialization 
- creating the metadata tables 
 - Parameters
- skip_object_handling – If True, skips installing routines related to object handling and checkouts (like audit triggers and CStore management). 
 
- splitgraph.core.engine.lookup_repository(name: str, include_local: bool = False) Repository
- Queries the SG engines on the lookup path to locate one hosting the given repository. - Parameters
- name – Repository name 
- include_local – If True, also queries the local engine 
 
- Returns
- Local or remote Repository object 
 
- splitgraph.core.engine.repository_exists(repository: Repository) bool
- Checks if a repository exists on the engine. - Parameters
- repository – Repository object 
 
splitgraph.core.fdw_checkout module
splitgraph.core.fragment_manager module
Routines related to storing tables as fragments.
- class splitgraph.core.fragment_manager.Digest(shorts: Tuple[int, ...])
- Bases: - object- Homomorphic hashing similar to LtHash (but limited to being backed by 256-bit hashes). The main property is that for any rows A, B, LtHash(A) + LtHash(B) = LtHash(A+B). This is done by construction: we simply hash individual rows and then do bit-wise addition / subtraction of individual hashes to come up with the full table hash. - Hence, the content hash of any Splitgraph table fragment is the sum of hashes of its added rows minus the sum of hashes of its deleted rows (including the old values of the rows that have been updated). This has a very useful implication: the hash of a full Splitgraph table is equal to the sum of hashes of its individual fragments. - This property can be used to simplify deduplication. - classmethod empty() splitgraph.core.fragment_manager.Digest
- Return an empty Digest instance such that for any Digest D, D + empty == D - empty == D 
 - classmethod from_hex(hex_string: str) splitgraph.core.fragment_manager.Digest
- Create a Digest from a 64-characters (256-bit) hexadecimal string 
 - classmethod from_memoryview(memory: Union[bytes, memoryview]) splitgraph.core.fragment_manager.Digest
- Create a Digest from a 256-bit memoryview/bytearray. 
 - hex() str
- Convert the hash into a hexadecimal value. 
 
- class splitgraph.core.fragment_manager.FragmentManager(object_engine: PostgresEngine, metadata_engine: Optional[PostgresEngine] = None)
- Bases: - splitgraph.core.metadata_manager.MetadataManager- A storage engine for Splitgraph tables. Each table can be stored as one or more immutable fragments that can optionally overwrite each other. When a new table is created, it’s split up into multiple base fragments. When a new version of the table is written, the audit log is inspected and one or more patch fragments are created, to be based on the fragments the previous version of the table consisted of. Only the top fragments in this stack are stored in the table metadata: to reconstruct the whole table, the links from the top fragments down to the base fragments have to be followed. - In addition, the fragments engine also supports min-max indexing on fragments: this is used to only fetch fragments that are required for a given query. - calculate_content_hash(schema: str, table: str, table_schema: Optional[List[splitgraph.core.types.TableColumn]] = None, chunk_condition_sql: Optional[psycopg2.sql.Composable] = None, chunk_condition_args: Optional[List[Any]] = None) Tuple[str, int]
- Calculates the homomorphic hash of table contents. - Parameters
- schema – Schema the table belongs to 
- table – Name of the table 
- table_schema – Schema of the table 
- chunk_condition_sql – Column the table is partitioned on 
- chunk_condition_args – Column value to get rows from 
 
- Returns
- A 64-character (256-bit) hexadecimal string with the content hash of the table and the number of rows in the hash. 
 
 - calculate_fragment_insertion_hash_stats(schema: str, table: str, table_schema: Optional[List[splitgraph.core.types.TableColumn]] = None) Tuple[splitgraph.core.fragment_manager.Digest, int]
- Calculate the homomorphic hash of just the rows that a given fragment inserts :param schema: Schema the fragment is stored in. :param table: Name of the table the fragment is stored in. :return: A Digest object and the number of inserted rows 
 - create_base_fragment(source_schema: str, source_table: str, namespace: str, chunk_condition_sql: Optional[psycopg2.sql.Composable] = None, chunk_condition_args: Optional[List[Any]] = None, extra_indexes: Optional[Dict[str, Union[List[str], Dict[str, Dict[str, Any]]]]] = None, in_fragment_order: Optional[List[str]] = None, overwrite: bool = False, table_schema: Optional[List[splitgraph.core.types.TableColumn]] = None) str
 - delete_objects(objects: Union[Set[str], List[str]]) None
- Deletes objects from the Splitgraph cache - Parameters
- objects – A sequence of objects to be deleted 
 
 - filter_fragments(object_ids: List[str], table: Table, quals: Any) List[str]
- Performs fuzzy filtering on the given object IDs using the index and a set of qualifiers, discarding objects that definitely do not match the qualifiers. - Parameters
- object_ids – List of object IDs to filter. 
- table – A Table object the objects belong to. 
- quals – - List of qualifiers in conjunctive normal form that will be matched against the index. Objects that definitely don’t match these qualifiers will be discarded. - A list containing [[qual_1, qual_2], [qual_3, qual_4]] will be interpreted as (qual_1 OR qual_2) AND (qual_3 OR qual_4). - Each qual is a tuple of (column_name, operator, value) whereoperator can be one of >, >=, <, <=, =. - For unknown operators, it will be assumed that all fragments might match that clause. 
 
- Returns
- List of objects that might match the given qualifiers. 
 
 - generate_object_index(object_id: str, table_schema: List[splitgraph.core.types.TableColumn], changeset: Optional[Dict[Tuple[str, ...], Tuple[bool, Dict[str, Any], Dict[str, Any]]]] = None, extra_indexes: Optional[Dict[str, Union[List[str], Dict[str, Dict[str, Any]]]]] = None) Dict[str, Any]
- Queries the max/min values of a given fragment for each column, used to speed up querying. - Parameters
- object_id – ID of an object 
- table_schema – Schema of the table the object belongs to. 
- changeset – Optional, if specified, the old row values are included in the index. 
- extra_indexes – Dictionary of {index_type: column: index_specific_kwargs}. 
 
- Returns
- Dict containing the object index. 
 
 - generate_surrogate_pk(table: Table, object_pks: List[Tuple[Any, Any]]) List[Tuple[Any, Any]]
- When partitioning data, if the table doesn’t have a primary key, we use a “surrogate” primary key by concatenating the whole row as a string on the PG side (this is because the whole row can sometimes contain NULLs which we can’t compare in PG). - We need to mimic this when calculating if the objects we’re about to scan through overlap: e.g. using string comparison, “(some_country, 100)” < “(some_country, 20)”, whereas using typed comparison, (some_country, 100) > (some_country, 20). - To do this, we use a similar hack from when calculating changeset hashes: to avoid having to reproduce how PG’s ::text works, we give it back the rows and get it to cast them to text for us. 
 - get_min_max_pks(fragments: List[str], table_pks: List[Tuple[str, str]]) List[Tuple[Tuple, Tuple]]
- Get PK ranges for given fragments using the index (without reading the fragments). - Parameters
- fragments – List of object IDs (must be registered and with the same schema) 
- table_pks – List of tuples (column, type) that form the object PK. 
 
- Returns
- List of (min, max) PK for every fragment where PK is a tuple. If a fragment doesn’t exist or doesn’t have a corresponding index entry, a SplitGraphError is raised. 
 
 - record_table_as_base(repository: Repository, table_name: str, image_hash: str, chunk_size: Optional[int] = 10000, source_schema: Optional[str] = None, source_table: Optional[str] = None, extra_indexes: Optional[Dict[str, Union[List[str], Dict[str, Dict[str, Any]]]]] = None, in_fragment_order: Optional[List[str]] = None, overwrite: bool = False, table_schema: Optional[List[splitgraph.core.types.TableColumn]] = None) List[str]
- Copies the full table verbatim into one or more new base fragments and registers them. - Parameters
- repository – Repository 
- table_name – Table name 
- image_hash – Hash of the new image 
- chunk_size – If specified, splits the table into multiple objects with a given number of rows 
- source_schema – Override the schema the source table is stored in 
- source_table – Override the name of the table the source is stored in 
- extra_indexes – Dictionary of {index_type: column: index_specific_kwargs}. 
- in_fragment_order – Key to sort data inside each chunk by. 
- overwrite – Overwrite physical objects that already exist. 
- table_schema – Override the columns that will be picked from the original table (e.g. to change their order or primary keys). Note that the schema must be a subset of the original schema and this method doesn’t verify PK constraints. 
 
 
 - record_table_as_patch(old_table: Table, schema: str, image_hash: str, new_schema_spec: List[splitgraph.core.types.TableColumn] = None, split_changeset: bool = False, extra_indexes: Optional[Dict[str, Union[List[str], Dict[str, Dict[str, Any]]]]] = None, in_fragment_order: Optional[List[str]] = None, overwrite: bool = False) None
- Flushes the pending changes from the audit table for a given table and records them, registering the new objects. - Parameters
- old_table – Table object pointing to the current HEAD table 
- schema – Schema the table is checked out into. 
- image_hash – Image hash to store the table under 
- new_schema_spec – New schema of the table (use the old table’s schema by default). 
- split_changeset – See Repository.commit for reference 
- extra_indexes – Dictionary of {index_type: column: index_specific_kwargs}. 
 
 
 - split_changeset_boundaries(changeset, change_key, objects)
 
- splitgraph.core.fragment_manager.get_chunk_groups(chunks: List[Tuple[str, Any, Any]]) List[List[Tuple[str, Any, Any]]]
- Takes a list of chunks and their boundaries and combines them into independent groups such that chunks from no two groups overlap with each other (intervals are assumed to be closed, e.g. chunk (1,2) overlaps with chunk (2,3)). - The original order of chunks is preserved within each group. - For example, 4 chunks A, B, C, D that don’t overlap each other will be grouped into 4 groups [A], [B], [C], [D]. - If A overlaps B, the result will be [A, B], [C], [D]. - If in addition B overlaps C (but not A), the result will be [A, B, C], [D]. - If in addition D overlaps any of A, B or C, the result will be [A, B, C, D](despite that D is located before A: it will be last since it was last in the original list). - Parameters
- chunks – List of (chunk_id, start, end) 
- Returns
- List of lists of (chunk_id, start, end) 
 
splitgraph.core.image module
Image representation and provenance
- class splitgraph.core.image.Image(image_hash: str, parent_id: Optional[str], created: datetime.datetime, comment: str, provenance_data: List[Dict[str, Union[str, List[str], List[bool], List[Dict[str, str]]]]], repository: Repository)
- Bases: - tuple- Represents a Splitgraph image. Should’t be created directly, use Image-loading methods in the - splitgraph.core.repository.Repositoryclass instead.- checkout(force: bool = False, layered: bool = False) None
- Checks the image out, changing the current HEAD pointer. Raises an error if there are pending changes to its checkout. - Parameters
- force – Discards all pending changes to the schema. 
- layered – If True, uses layered querying to check out the image (doesn’t materialize tables inside of it). 
 
 
 - comment: str
- Alias for field number 3 
 - created: datetime.datetime
- Alias for field number 2 
 - delete_tag(tag: str) None
- Deletes a tag from an image. - Parameters
- tag – Tag to delete. 
 
 - property engine
 - get_log() List[splitgraph.core.image.Image]
- Repeatedly gets the parent of a given image until it reaches the bottom. 
 - get_parent_children() Tuple[Optional[str], List[str]]
- Gets the parent and a list of children of a given image. 
 - get_size() int
- Get the physical size used by the image’s objects (including those that might be shared with other images). - This is calculated from the metadata, the on-disk footprint might be smaller if not all of image’s objects have been downloaded. - Returns
- Size of the image in bytes. 
 
 - get_table(table_name: str) splitgraph.core.table.Table
- Returns a Table object representing a version of a given table. Contains a list of objects that the table is linked to and the table’s schema. - Parameters
- table_name – Name of the table 
- Returns
- Table object 
 
 - get_tables() List[str]
- Gets the names of all tables inside of an image. 
 - get_tags()
- Lists all tags that this image has. 
 - image_hash: str
- Alias for field number 0 
 - lq_checkout(target_schema: Optional[str] = None, wrapper: Optional[str] = 'splitgraph.core.fdw_checkout.QueryingForeignDataWrapper', only_tables: Optional[List[str]] = None) None
- Intended to be run on the sgr side. Initializes the FDW for all tables in a given image, allowing to query them directly without materializing the tables. 
 - property object_engine
 - parent_id: Optional[str]
- Alias for field number 1 
 - provenance(reverse=False, engine=None) List[Tuple[Repository, str]]
- Inspects the image’s parent chain to come up with a set of repositories and their hashes that it was created from. - If reverse is True, returns a list of images that were created _from_ this image. If this image is on a remote repository, engine can be passed in to override the engine used for the lookup of dependents. - Returns
- List of (repository, image_hash) 
 
 - provenance_data: List[Dict[str, Union[str, List[str], List[bool], List[Dict[str, str]]]]]
- Alias for field number 4 
 - query_schema(wrapper: Optional[str] = 'splitgraph.core.fdw_checkout.QueryingForeignDataWrapper', commit: bool = True) Iterator[str]
- Creates a temporary schema with tables in this image mounted as foreign tables that can be accessed via read-only layered querying. On exit from the context manager, the schema is discarded. - Returns
- The name of the schema the image is located in. 
 
 - repository: Repository
- Alias for field number 5 
 - set_provenance(provenance_data: List[Dict[str, Union[str, List[str], List[bool], List[Dict[str, str]]]]]) None
- Sets the image’s provenance. Internal function called by the Splitfile interpreter, shouldn’t be called directly as it changes the image after it’s been created. - Parameters
- provenance_data – List of parsed Splitfile commands and their data. 
 
 - tag(tag: str) None
- Tags a given image. All tags are unique inside of a repository. If a tag already exists, it’s removed from the previous image and given to the new image. - Parameters
- tag – Tag to set. ‘latest’ and ‘HEAD’ are reserved tags. 
 
 - to_splitfile(ignore_irreproducible: bool = False, source_replacement: Optional[Dict[Repository, str]] = None) List[str]
- Recreate the Splitfile that can be used to reconstruct this image. - Parameters
- ignore_irreproducible – If True, ignore commands from irreproducible Splitfile lines (like MOUNT or custom commands) and instead emit a comment (this results in an invalid Splitfile). 
- source_replacement – A dictionary of repositories and image hashes/tags specifying how to replace the dependencies of this Splitfile (table imports and FROM commands). 
 
- Returns
- A list of Splitfile commands that can be fed back into the executor. 
 
 
- splitgraph.core.image.getrandbits(k) x. Generates an int with k random bits.
- splitgraph.core.image.reconstruct_splitfile(provenance_data: List[Dict[str, Union[str, List[str], List[bool], List[Dict[str, str]]]]], ignore_irreproducible: bool = False, source_replacement: Optional[Dict[Repository, str]] = None) List[str]
- Recreate the Splitfile that can be used to reconstruct an image. 
splitgraph.core.image_manager module
- class splitgraph.core.image_manager.ImageManager(repository: Repository)
- Bases: - object- Collects various image-related functions. - add(parent_id: Optional[str], image: str, created: Optional[datetime.datetime] = None, comment: Optional[str] = None, provenance_data: Optional[List[Dict[str, Union[str, List[str], List[bool], List[Dict[str, str]]]]]] = None) None
- Registers a new image in the Splitgraph image tree. - Internal method used by actual image creation routines (committing, importing or pulling). - Parameters
- parent_id – Parent of the image 
- image – Image hash 
- created – Creation time (defaults to current timestamp) 
- comment – Comment (defaults to empty) 
- provenance_data – Provenance data that can be used to reconstruct the image. 
 
 
 - add_batch(images: List[splitgraph.core.image.Image]) None
- Like add, but registers multiple images at the same time. Used in push/pull to avoid a roundtrip to the registry for each image :param images: List of Image objects. Namespace and repository will be patched - with this repository. 
 - by_hash(image_hash: str) splitgraph.core.image.Image
- Returns an image corresponding to a given (possibly shortened) image hash. If the image hash is ambiguous, raises an error. If the image does not exist, raises an error or returns None. - Parameters
- image_hash – Image hash (can be shortened). 
- Returns
- Image 
 
 - by_tag(tag: str, raise_on_none: bool = True) Optional[splitgraph.core.image.Image]
- Returns an image with a given tag - Parameters
- tag – Tag. ‘latest’ is a special case: it returns the most recent image in the repository. 
- raise_on_none – Whether to raise an error or return None if the tag doesn’t exist. 
 
 
 - delete(images: Iterable[str]) None
- Deletes a set of Splitgraph images from the repository. Note this doesn’t check whether this will orphan some other images in the repository and can make the state of the repository invalid. - Image deletions won’t be replicated on push/pull (those can only add new images). - Parameters
- images – List of image IDs 
 
 - get_all_child_images(start_image: str) Set[str]
- Get all children of start_image of any degree. 
 - get_all_parent_images(start_images: Set[str]) Set[str]
- Get all parents of the ‘start_images’ set of any degree. 
 
splitgraph.core.image_mounting module
- class splitgraph.core.image_mounting.DefaultImageMounter(engine: PostgresEngine)
- Bases: - splitgraph.core.image_mounting.ImageMounter- mount(images: List[Tuple[str, str, str]]) Dict[Tuple[str, str, str], str]
- Mount images into arbitrary schemas and return the image -> schema map 
 - unmount() None
- Unmount the previously mounted images 
 
- class splitgraph.core.image_mounting.ImageMapper(object_engine: PostgresEngine)
- Bases: - object- Image “mapper” used in Splitfile execution that returns the canonical form of an image and where it’s mounted. - get_provenance_data() Dict[str, Union[str, List[str], List[bool], List[Dict[str, str]]]]
 - setup_lq_mounts() None
 - teardown_lq_mounts() None
 
- class splitgraph.core.image_mounting.ImageMounter
- Bases: - abc.ABC- Image “mounter” that’s used for non-Splitfile transformation jobs (data source plugins that implement TransformingDataSource) - abstract mount(images: List[Tuple[str, str, str]]) Dict[Tuple[str, str, str], str]
- Mount images into arbitrary schemas and return the image -> schema map 
 - abstract unmount() None
- Unmount the previously mounted images 
 
splitgraph.core.metadata_manager module
Classes related to managing table/image/object metadata tables.
- class splitgraph.core.metadata_manager.MetadataManager(metadata_engine: PsycopgEngine)
- Bases: - object- A data access layer for the metadata tables in the splitgraph_meta schema that concerns itself with image, table and object information. - cleanup_metadata() List[str]
- Go through the current metadata and delete all objects that aren’t required by any table on the engine. - Returns
- List of objects that have been deleted. 
 
 - delete_object_meta(object_ids: Sequence[str])
- Delete metadata for multiple objects (external locations, indexes, hashes). This doesn’t delete physical objects. - Parameters
- object_ids – Object IDs to delete 
 
 - get_all_objects() List[str]
- Gets all objects currently in the Splitgraph tree. - Returns
- List of object IDs. 
 
 - get_external_object_locations(objects: List[str]) List[Tuple[str, str, str]]
- Gets external locations for objects. - Parameters
- objects – List of object IDs stored externally. 
- Returns
- List of (object_id, location, protocol). 
 
 - get_new_objects(object_ids: List[str]) List[str]
- Get object IDs from the passed list that don’t exist in the tree. - Parameters
- object_ids – List of objects to check 
- Returns
- List of unknown object IDs. 
 
 - get_object_meta(objects: List[str]) Dict[str, splitgraph.core.metadata_manager.Object]
- Get metadata for multiple Splitgraph objects from the tree - Parameters
- objects – List of objects to get metadata for. 
- Returns
- Dictionary of object_id -> Object 
 
 - get_objects_for_repository(repository: Repository, image_hash: Optional[str] = None) List[str]
 - get_unused_objects(threshold: Optional[int] = None) List[Tuple[str, datetime.datetime]]
- Get a list of all objects in the metadata that aren’t used by any table and can be safely deleted. - Parameters
- threshold – Only return objects that were created earlier than this (in minutes) 
- Returns
- List of objects and their creation times. 
 
 - overwrite_table(repository: Repository, image_hash: str, table_name: str, table_schema: List[splitgraph.core.types.TableColumn], objects: List[str])
 - register_object_locations(object_locations: List[Tuple[str, str, str]]) None
- Registers external locations (e.g. HTTP or S3) for Splitgraph objects. Objects must already be registered in the object tree. - Parameters
- object_locations – List of (object_id, location, protocol). 
 
 - register_objects(objects: List[splitgraph.core.metadata_manager.Object], namespace: Optional[str] = None) None
- Registers multiple Splitgraph objects in the tree. - Parameters
- objects – List of Object objects. 
- namespace – If specified, overrides the original object namespace, required in the case where the remote repository has a different namespace than the local one. 
 
 
 - register_tables(repository: Repository, table_meta: List[Tuple[str, str, List[splitgraph.core.types.TableColumn], List[str]]]) None
- Links tables in an image to physical objects that they are stored as. Objects must already be registered in the object tree. - Parameters
- repository – Repository that the tables belong to. 
- table_meta – A list of (image_hash, table_name, table_schema, object_ids). 
 
 
 
- class splitgraph.core.metadata_manager.Object(object_id: str, format: str, namespace: str, size: int, created: datetime.datetime, insertion_hash: str, deletion_hash: str, object_index: Dict[str, Any], rows_inserted: int, rows_deleted: int)
- Bases: - tuple- Represents a Splitgraph object that tables are composed of. - created: datetime.datetime
- Alias for field number 4 
 - deletion_hash: str
- Alias for field number 6 
 - format: str
- Alias for field number 1 
 - insertion_hash: str
- Alias for field number 5 
 - namespace: str
- Alias for field number 2 
 - object_id: str
- Alias for field number 0 
 - object_index: Dict[str, Any]
- Alias for field number 7 
 - rows_deleted: int
- Alias for field number 9 
 - rows_inserted: int
- Alias for field number 8 
 - size: int
- Alias for field number 3 
 
splitgraph.core.migration module
- splitgraph.core.migration.get_installed_version(engine: PsycopgEngine, schema_name: str, version_table: str = 'version') Optional[Tuple[str, datetime.datetime]]
- splitgraph.core.migration.get_version_tuples(filenames: List[str]) List[Tuple[Optional[str], str]]
- splitgraph.core.migration.make_file_list(schema_name: str, migration_path: List[Optional[str]])
- Construct a list of file names from history of versions and schema name 
- splitgraph.core.migration.set_installed_version(engine: PsycopgEngine, schema_name: str, version: str, version_table: str = 'version')
- splitgraph.core.migration.source_files_to_apply(engine: PsycopgEngine, schema_name: str, schema_files: List[str], version_table: str = 'version', static: bool = False, target_version: Optional[str] = None) Tuple[List[str], str]
- Get the ordered list of .sql files to apply to the database 
splitgraph.core.object_manager module
Functions related to creating, deleting and keeping track of physical Splitgraph objects.
- class splitgraph.core.object_manager.ObjectManager(object_engine: PostgresEngine, metadata_engine: Optional[PostgresEngine] = None)
- Bases: - splitgraph.core.fragment_manager.FragmentManager- Brings the multiple manager classes together and manages the object cache (downloading and uploading objects as required in order to fulfill certain queries) - cleanup() List[str]
- Deletes all objects in the object_tree not required by any current repository, including their dependencies and their remote locations. Also deletes all objects not registered in the object_tree. 
 - download_objects(source: Optional[splitgraph.core.object_manager.ObjectManager], objects_to_fetch: List[str], object_locations: List[Tuple[str, str, str]]) List[str]
- Fetches the required objects from the remote and stores them locally. Does nothing for objects that already exist. - Parameters
- source – Remote ObjectManager. If None, will only try to download objects from the external location. 
- objects_to_fetch – List of object IDs to download. 
- object_locations – List of custom object locations, encoded as tuples (object_id, object_url, protocol). 
 
 
 - ensure_objects(table: Optional[Table], objects: Optional[List[str]] = None, quals: Optional[Sequence[Sequence[Tuple[str, str, Any]]]] = None, defer_release: bool = False, tracer: Optional[splitgraph.core.common.Tracer] = None, upstream_manager: Optional[ObjectManager] = None) Iterator[Union[List[str], Tuple[List[str], splitgraph.core.common.CallbackList]]]
- Resolves the objects needed to materialize a given table and makes sure they are in the local splitgraph_meta schema. - Whilst inside this manager, the objects are guaranteed to exist. On exit from it, the objects are marked as unneeded and can be garbage collected. - Parameters
- table – Table to materialize 
- objects – List of objects to download: one of table or objects must be specified. 
- quals – Optional list of qualifiers to be passed to the fragment engine. Fragments that definitely do not match these qualifiers will be dropped. See the docstring for filter_fragments for the format. 
- defer_release – If True, won’t release the objects on exit. 
 
- Returns
- If defer_release is True: List of table fragments and a callback that the caller must call when the objects are no longer needed. If defer_release is False: just the list of table fragments. 
 
 - get_cache_occupancy() int
- Returns
- Space occupied by objects cached from external locations, in bytes. 
 
 - get_downloaded_objects(limit_to: Optional[List[str]] = None) List[str]
- Gets a list of objects currently in the Splitgraph cache (i.e. not only existing externally.) - Parameters
- limit_to – If specified, only the objects in this list will be returned. 
- Returns
- Set of object IDs. 
 
 - get_total_object_size()
- Returns
- Space occupied by all objects on the engine, in bytes. 
 
 - make_objects_external(objects: List[str], handler: str, handler_params: Dict[Any, Any]) None
- Uploads local objects to an external location and marks them as being cached locally (thus making it possible to evict or swap them out). - Parameters
- objects – Object IDs to upload. Will do nothing for objects that already exist externally. 
- handler – Object handler 
- handler_params – Extra handler parameters 
 
 
 - run_eviction(keep_objects: List[str], required_space: Optional[int] = None) None
- Delete enough objects with zero reference count (only those, since we guarantee that whilst refcount is >0, the object stays alive) to free at least required_space in the cache. - Parameters
- keep_objects – List of objects (besides those with nonzero refcount) that can’t be deleted. 
- required_space – Space, in bytes, to free. If the routine can’t free at least this much space, it shall raise an exception. If None, removes all eligible objects. 
 
 
 - upload_objects(target: splitgraph.core.object_manager.ObjectManager, objects_to_push: List[str], handler: str = 'DB', handler_params: Optional[Dict[Any, Any]] = None) Sequence[Tuple[str, Optional[str]]]
- Uploads physical objects to the remote or some other external location. - Parameters
- target – Target ObjectManager 
- objects_to_push – List of object IDs to upload. 
- handler – Name of the handler to use to upload objects. Use DB to push them to the remote, FILEto store them in a directory that can be accessed from the client and HTTP to upload them to HTTP. 
- handler_params – For HTTP, a dictionary {“username”: username, “password”, password}. For FILE, a dictionary {“path”: path} specifying the directory where the objects shall be saved. 
 
- Returns
- A list of (object_id, url) that specifies all objects were uploaded (skipping objects that already exist on the remote). 
 
 
splitgraph.core.output module
- class splitgraph.core.output.Color
- Bases: - object- An enumeration of console colors - BLUE = '\x1b[94m'
 - BOLD = '\x1b[1m'
 - CYAN = '\x1b[96m'
 - DARKCYAN = '\x1b[36m'
 - END = '\x1b[0m'
 - GREEN = '\x1b[92m'
 - PURPLE = '\x1b[95m'
 - RED = '\x1b[91m'
 - UNDERLINE = '\x1b[4m'
 - YELLOW = '\x1b[93m'
 
- class splitgraph.core.output.ResettableStream(stream)
- Bases: - io.RawIOBase- Stream that supports reading from the underlying stream and resetting the position once. - We can’t use fseek() in this case, since we might be reading from a pipe. So, we operate this stream in two modes. In the first mode, we mirror all reads into a separate buffer (consuming the input stream). After the user calls reset(), we first output data from the mirrored copy, then continue consuming the input stream (simulating seek(0). - readable()
- Return whether object was opened for reading. - If False, read() will raise OSError. 
 - readinto(b)
 - reset()
 
- splitgraph.core.output.conn_string_to_dict(connection: Optional[str]) Dict[str, Any]
- splitgraph.core.output.parse_date(string: str) datetime.date
- splitgraph.core.output.parse_dt(string: str) datetime.datetime
- splitgraph.core.output.parse_repo_tag_or_hash(value, default='latest')
- splitgraph.core.output.parse_time(string: str) time.struct_time
- splitgraph.core.output.pluralise(word: str, number: int) str
- 1 banana, 2 bananas 
- splitgraph.core.output.pretty_size(size: Union[int, float]) str
- Converts a size in bytes to its string representation (e.g. 1024 -> 1KiB) :param size: Size in bytes 
- splitgraph.core.output.slugify(text: str, max_length: int = 50) str
- splitgraph.core.output.truncate_line(line: str, length: int = 80) str
- Truncates a line to a given length, replacing the remainder with … 
- splitgraph.core.output.truncate_list(items: List[Any], max_entries: int = 10) str
- Print a list, possibly truncating it to the specified number of entries 
splitgraph.core.registry module
Functions for communicating with the remote Splitgraph catalog
- splitgraph.core.registry.get_info_key(engine: PostgresEngine, key: str) Optional[str]
- Gets a configuration key from the remote registry, used to notify the client of the registry’s capabilities. - Parameters
- engine – Engine 
- key – Key to get 
 
 
- splitgraph.core.registry.set_info_key(engine: PostgresEngine, key: str, value: Union[bool, str]) None
- Sets a configuration value on the remote registry. - Parameters
- engine – Engine 
- key – Key to set 
- value – New value for the key 
 
 
- splitgraph.core.registry.setup_registry_mode(engine: PostgresEngine) None
- Set up access policies/RLS: - Normal users aren’t allowed to create tables/schemata (can’t do checkouts inside of a registry or upload SG objects directly to it) 
- Normal users can’t access the splitgraph_meta schema directly: they’re only supposed to be able to talk to it via stored procedures in splitgraph_api. Those procedures are set up with SECURITY INVOKER (run with those users’ credentials) and what they can access is further restricted by RLS: - images/tables/tags meta tables: can only create/update/delete records where the namespace = user ID 
- objects/object_location tables: same. An object (piece of data) becomes owned by the user that creates it and still remains so even if someone else’s image starts using it. Hence, the original owner can delete or change it (since they control the external location they’ve uploaded it to anyway). 
 
 
splitgraph.core.repository module
Public API for managing images in a Splitgraph repository.
- class splitgraph.core.repository.Repository(namespace: str, repository: str, engine: Optional[splitgraph.engine.postgres.engine.PostgresEngine] = None, object_engine: Optional[splitgraph.engine.postgres.engine.PostgresEngine] = None, object_manager: Optional[splitgraph.core.object_manager.ObjectManager] = None)
- Bases: - object- Splitgraph repository API - commit(image_hash: Optional[str] = None, comment: Optional[str] = None, snap_only: bool = False, chunk_size: Optional[int] = None, split_changeset: bool = False, extra_indexes: Optional[Dict[str, Dict[str, Union[List[str], Dict[str, Dict[str, Any]]]]]] = None, in_fragment_order: Optional[Dict[str, List[str]]] = None, overwrite: bool = False) splitgraph.core.image.Image
- Commits all pending changes to a given repository, creating a new image. - Parameters
- image_hash – Hash of the commit. Chosen by random if unspecified. 
- comment – Optional comment to add to the commit. 
- snap_only – If True, will store the table as a full snapshot instead of delta compression 
- chunk_size – For tables that are stored as snapshots (new tables and where snap_only has been passed, the table will be split into fragments of this many rows. 
- split_changeset – If True, splits the changeset into multiple fragments based on the PK regions spanned by the current table fragments. For example, if the original table consists of 2 fragments, first spanning rows 1-10000, second spanning rows 10001-20000 and the change alters rows 1, 10001 and inserts a row with PK 20001, this will record the change as 3 fragments: one inheriting from the first original fragment, one inheriting from the second and a brand new fragment. This increases the number of fragments in total but means that fewer rows will need to be scanned to satisfy a query. If False, the changeset will be stored as a single fragment inheriting from the last fragment in the table. 
- extra_indexes – Dictionary of {table: index_type: column: index_specific_kwargs}. 
- in_fragment_order – Dictionary of {table: list of columns}. If specified, will 
 
 - sort the data inside each chunk by this/these key(s) for each table. :param overwrite: If an object already exists, will force recreate it. - Returns
- The newly created Image object. 
 
 - commit_engines() None
- Commit the underlying transactions on both engines that the repository uses. 
 - delete(unregister: bool = True, uncheckout: bool = True) None
- Discards all changes to a given repository and optionally all of its history, as well as deleting the Postgres schema that it might be checked out into. Doesn’t delete any cached physical objects. - After performing this operation, this object becomes invalid and must be discarded, unless init() is called again. - Parameters
- unregister – Whether to purge repository history/metadata 
- uncheckout – Whether to delete the actual checked out repo. This has no effect if the repository is backed by a registry (rather than a local engine). 
 
 
 - diff(table_name: str, image_1: Union[splitgraph.core.image.Image, str], image_2: Optional[Union[splitgraph.core.image.Image, str]], aggregate: bool = False) Optional[Union[bool, Tuple[int, int, int], List[Tuple[bool, Tuple]]]]
- Compares the state of a table in different images by materializing both tables into a temporary space and comparing them row-to-row. - Parameters
- table_name – Name of the table. 
- image_1 – First image hash / object. If None, uses the state of the current staging area. 
- image_2 – Second image hash / object. If None, uses the state of the current staging area. 
- aggregate – If True, returns a tuple of integers denoting added, removed and updated rows between the two images. 
 
- Returns
- If the table doesn’t exist in one of the images, returns True if it was added and False if it was removed. If aggregate is True, returns the aggregation of changes as specified before. Otherwise, returns a list of changes where each change is a tuple of(True for added, False for removed, row contents). 
 
 - dump(stream: _io.TextIOWrapper, exclude_object_contents: bool = False) None
- Creates an SQL dump with the metadata required for the repository and all of its objects. - Parameters
- stream – Stream to dump the data into. 
- exclude_object_contents – Only dump the metadata but not the actual object contents. 
 
 
 - classmethod from_schema(schema: str) splitgraph.core.repository.Repository
- Convert a Postgres schema name of the format namespace/repository to a Splitgraph repository object. 
 - classmethod from_template(template: splitgraph.core.repository.Repository, namespace: Optional[str] = None, repository: Optional[str] = None, engine: Optional[splitgraph.engine.postgres.engine.PostgresEngine] = None, object_engine: Optional[splitgraph.engine.postgres.engine.PostgresEngine] = None) splitgraph.core.repository.Repository
- Create a Repository from an existing one replacing some of its attributes. 
 - get_all_hashes_tags() List[Tuple[Optional[str], str]]
- Gets all tagged images and their hashes in a given repository. - Returns
- List of (image_hash, tag) 
 
 - get_local_size() int
- Get the actual size used by this repository’s downloaded objects. - This might still be double-counted if the repository shares objects with other repositores. - Returns
- Size of the repository in bytes. 
 
 - get_size() int
- Get the physical size used by the repository’s data, counting objects that are used by multiple images only once. This is calculated from the metadata, the on-disk footprint might be smaller if not all of repository’s objects have been downloaded. - Returns
- Size of the repository in bytes. 
 
 - has_pending_changes() bool
- Detects if the repository has any pending changes (schema changes, table additions/deletions, content changes). 
 - property head: Optional[splitgraph.core.image.Image]
- Return the HEAD image for the repository or None if the repository isn’t checked out. 
 - property head_strict: splitgraph.core.image.Image
- Return the HEAD image for the repository. Raise an exception if the repository isn’t checked out. 
 - images
- A - splitgraph.core.image.ImageManagerinstance that performs operations (checkout, delete etc) on this repository’s images.
 - import_tables(tables: Sequence[str], source_repository: splitgraph.core.repository.Repository, source_tables: Sequence[str], image_hash: Optional[str] = None, foreign_tables: bool = False, do_checkout: bool = True, target_hash: Optional[str] = None, table_queries: Optional[Sequence[bool]] = None, parent_hash: Optional[str] = None, wrapper: Optional[str] = 'splitgraph.core.fdw_checkout.QueryingForeignDataWrapper', skip_validation: bool = False) str
- Creates a new commit in target_repository with one or more tables linked to already-existing tables. After this operation, the HEAD of the target repository moves to the new commit and the new tables are materialized. - Parameters
- tables – If not empty, must be the list of the same length as source_tables specifying names to store them under in the target repository. 
- source_repository – Repository to import tables from. 
- source_tables – List of tables to import. If empty, imports all tables. 
- image_hash – Image hash in the source repository to import tables from. Uses the current source HEAD by default. 
- foreign_tables – If True, copies all source tables to create a series of new snapshots instead of treating them as Splitgraph-versioned tables. This is useful for adding brand new tables (for example, from an FDW-mounted table). 
- do_checkout – If False, doesn’t check out the newly created image. 
- target_hash – Hash of the new image that tables is recorded under. If None, gets chosen at random. 
- table_queries – If not [], it’s treated as a Boolean mask showing which entries in the tables list are instead SELECT SQL queries that form the target table. The queries have to be non-schema qualified and work only against tables in the source repository. Each target table created is the result of the respective SQL query. This is committed as a new snapshot. 
- parent_hash – If not None, must be the hash of the image to base the new image on. Existing tables from the parent image are preserved in the new image. If None, the current repository HEAD is used. 
- wrapper – Override the default class for the layered querying foreign data wrapper. 
- skip_validation – Don’t validate SQL used in import statements (used by the Splitfile executor that pre-formats the SQL). 
 
- Returns
- Hash that the new image was stored under. 
 
 - init() None
- Initializes an empty repo with an initial commit (hash 0000…) 
 - materialized_table(table_name: str, image_hash: Optional[str]) Iterator[Tuple[str, str]]
- A context manager that returns a pointer to a read-only materialized table in a given image. The table is deleted on exit from the context manager. - Parameters
- table_name – Name of the table 
- image_hash – Image hash to materialize 
 
- Returns
- (schema, table_name) where the materialized table is located. 
 
 - objects
- A - splitgraph.core.object_manager.ObjectManagerinstance that performs operations on the objects on this repository’s engine (not just objects belonging to this repository).
 - pull(download_all: Optional[bool] = False, overwrite_objects: bool = False, overwrite_tags: bool = False, single_image: Optional[str] = None) None
- Synchronizes the state of the local Splitgraph repository with its upstream, optionally downloading all new objects created on the remote. - Parameters
- download_all – If True, downloads all objects and stores them locally. Otherwise, will only download required objects when a table is checked out. 
- overwrite_objects – If True, will overwrite object metadata on the local repository for existing objects. 
- overwrite_tags – If True, will overwrite existing tags. 
- single_image – Limit the download to a single image hash/tag. 
 
 
 - push(remote_repository: Optional[splitgraph.core.repository.Repository] = None, overwrite_objects: bool = False, reupload_objects: bool = False, overwrite_tags: bool = False, handler: str = 'DB', handler_options: Optional[Dict[str, Any]] = None, single_image: Optional[str] = None) splitgraph.core.repository.Repository
- Inverse of - pull: Pushes all local changes to the remote and uploads new objects.- Parameters
- remote_repository – Remote repository to push changes to. If not specified, the current upstream is used. 
- handler – Name of the handler to use to upload objects. Use DB to push them to the remote or S3to store them in an S3 bucket. 
- overwrite_objects – If True, will overwrite object metadata on the remote repository for existing objects. 
- reupload_objects – If True, will reupload objects for which metadata is uploaded. 
- overwrite_tags – If True, will overwrite existing tags on the remote repository. 
- handler_options – Extra options to pass to the handler. For example, see - splitgraph.hooks.s3.S3ExternalObjectHandler.
- single_image – Limit the upload to a single image hash/tag. 
 
 
 - rollback_engines() None
- Rollback the underlying transactions on both engines that the repository uses. 
 - run_sql(sql: Union[psycopg2.sql.Composed, str], arguments: Optional[Any] = None, return_shape: splitgraph.engine.ResultShape = ResultShape.MANY_MANY) Any
- Execute an arbitrary SQL statement inside of this repository’s checked out schema. 
 - set_tags(tags: Dict[str, Optional[str]]) None
- Sets tags for multiple images. - Parameters
- tags – List of (image_hash, tag) 
 
 - to_schema() str
- Returns the engine schema that this repository gets checked out into. 
 - uncheckout(force: bool = False) None
- Deletes the schema that the repository is checked out into - Parameters
- force – Discards all pending changes to the schema. 
 
 - property upstream
- The remote upstream repository that this local repository tracks. 
 
- splitgraph.core.repository.clone(remote_repository: Union[splitgraph.core.repository.Repository, str], local_repository: Optional[splitgraph.core.repository.Repository] = None, overwrite_objects: bool = False, overwrite_tags: bool = False, download_all: Optional[bool] = False, single_image: Optional[str] = None) splitgraph.core.repository.Repository
- Clones a remote Splitgraph repository or synchronizes remote changes with the local ones. - If the target repository has no set upstream engine, the source repository becomes its upstream. - Parameters
- remote_repository – Remote Repository object to clone or the repository’s name. If a name is passed, the repository will be looked up on the current lookup path in order to find the engine the repository belongs to. 
- local_repository – Local repository to clone into. If None, uses the same name as the remote. 
- download_all – If True, downloads all objects and stores them locally. Otherwise, will only download required objects when a table is checked out. 
- overwrite_objects – If True, will overwrite object metadata on the local repository for existing objects. 
- overwrite_tags – If True, will overwrite existing tags. 
- single_image – If set, only get a single image with this hash/tag from the source. 
 
- Returns
- A locally cloned Repository object. 
 
- splitgraph.core.repository.getrandbits(k) x. Generates an int with k random bits.
- splitgraph.core.repository.import_table_from_remote(remote_repository: splitgraph.core.repository.Repository, remote_tables: List[str], remote_image_hash: str, target_repository: splitgraph.core.repository.Repository, target_tables: List[Any], target_hash: Optional[str] = None) None
- Shorthand for importing one or more tables from a yet-uncloned remote. Here, the remote image hash is required, as otherwise we aren’t necessarily able to determine what the remote head is. - Parameters
- remote_repository – Remote Repository object 
- remote_tables – List of remote tables to import 
- remote_image_hash – Image hash to import the tables from 
- target_repository – Target repository to import the tables to 
- target_tables – Target table aliases 
- target_hash – Hash of the image that’s created with the import. Default random. 
 
 
- splitgraph.core.repository.table_exists_at(repository: splitgraph.core.repository.Repository, table_name: str, image: Optional[splitgraph.core.image.Image] = None) bool
- Determines whether a given table exists in a Splitgraph image without checking it out. If image_hash is None, determines whether the table exists in the current staging area. 
splitgraph.core.server module
Routines that are run inside of the engine, here so that they can get type- and syntax-checked.
When inside of an LQFDW shim, these are called directly by the Splitgraph core code to avoid a redundant connection to the engine.
- splitgraph.core.server.delete_object_files(object_id: str)
- splitgraph.core.server.download_object(object_id: str, urls: Tuple[str, str, str])
- splitgraph.core.server.get_object_schema(object_id: str) str
- splitgraph.core.server.get_object_size(object_id: str) int
- splitgraph.core.server.list_objects() List[str]
- splitgraph.core.server.object_exists(object_id: str) bool
- splitgraph.core.server.rename_object_files(old_object_id: str, new_object_id: str)
- splitgraph.core.server.set_object_schema(object_id: str, schema: str)
- splitgraph.core.server.upload_object(object_id: str, urls: Tuple[str, str, str])
- splitgraph.core.server.verify(url: str)
splitgraph.core.table module
Table metadata-related classes.
- class splitgraph.core.table.QueryPlan(table: splitgraph.core.table.Table, quals: Optional[Sequence[Sequence[Tuple[str, str, Any]]]], columns: Sequence[str])
- Bases: - object- Represents the initial query plan (fragments to query) for given columns and qualifiers. 
- class splitgraph.core.table.Table(repository: Repository, image: Image, table_name: str, table_schema: List[splitgraph.core.types.TableColumn], objects: List[str])
- Bases: - object- Represents a Splitgraph table in a given image. Shouldn’t be created directly, use Table-loading methods in the - splitgraph.core.image.Imageclass instead.- get_length() int
- Get the number of rows in this table. - This might be smaller than the total number of rows in all objects belonging to this table as some objects might overwrite each other. - Returns
- Number of rows in table 
 
 - get_query_plan(quals: Optional[Sequence[Sequence[Tuple[str, str, Any]]]], columns: Sequence[str], use_cache: bool = True) splitgraph.core.table.QueryPlan
- Start planning a query (preliminary steps before object downloading, like qualifier filtering). - Parameters
- quals – Qualifiers in CNF form 
- columns – List of columns 
- use_cache – If True, will fetch the plan from the cache for the same qualifiers and columns. 
 
- Returns
- QueryPlan 
 
 - get_size() int
- Get the physical size used by the table’s objects (including those shared with other tables). - This is calculated from the metadata, the on-disk footprint might be smaller if not all of table’s objects have been downloaded. - Returns
- Size of the table in bytes. 
 
 - materialize(destination: str, destination_schema: Optional[str] = None, lq_server: Optional[str] = None, temporary: bool = False) None
- Materializes a Splitgraph table in the target schema as a normal Postgres table, potentially downloading all required objects and using them to reconstruct the table. - Parameters
- destination – Name of the destination table. 
- destination_schema – Name of the destination schema. 
- lq_server – If set, sets up a layered querying FDW for the table instead using this foreign server. 
 
 
 - query(columns: List[str], quals: Sequence[Sequence[Tuple[str, str, Any]]])
- Run a read-only query against this table without materializing it. - This is a wrapper around query_lazy() that force evaluates the results which might mean more fragments being materialized that aren’t needed. - Parameters
- columns – List of columns from this table to fetch 
- quals – List of qualifiers in conjunctive normal form. See the documentation for FragmentManager.filter_fragments for the actual format. 
 
- Returns
- List of dictionaries of results 
 
 - query_indirect(columns: List[str], quals: Optional[Sequence[Sequence[Tuple[str, str, Any]]]]) Tuple[Iterator[bytes], Callable, splitgraph.core.table.QueryPlan]
- Run a read-only query against this table without materializing it. Instead of actual results, this returns a generator of SQL queries that the caller can use to get the results as well as a callback that the caller has to run after they’re done consuming the results. - In particular, the query generator will prefer returning direct queries to Splitgraph objects and only when those are exhausted will it start materializing delta-compressed fragments. - This is an advanced method: you probably want to call table.query(). - Parameters
- columns – List of columns from this table to fetch 
- quals – List of qualifiers in conjunctive normal form. See the documentation for FragmentManager.filter_fragments for the actual format. 
 
- Returns
- Generator of queries (bytes), a callback and a query plan object (containing stats that are fully populated after the callback has been called to end the query). 
 
 - query_lazy(columns: List[str], quals: Sequence[Sequence[Tuple[str, str, Any]]]) Iterator[Iterator[Dict[str, Any]]]
- Run a read-only query against this table without materializing it. - Parameters
- columns – List of columns from this table to fetch 
- quals – List of qualifiers in conjunctive normal form. See the documentation for FragmentManager.filter_fragments for the actual format. 
 
- Returns
- Generator of dictionaries of results. 
 
 - reindex(extra_indexes: Dict[str, Union[List[str], Dict[str, Dict[str, Any]]]], raise_on_patch_objects=True) List[str]
- Run extra indexes on all objects in this table and update their metadata. This only works on objects that don’t have any deletions or upserts (have a deletion hash of 000000…). - Parameters
- extra_indexes – Dictionary of {index_type: column: index_specific_kwargs}. 
- raise_on_patch_objects – If True, will raise an exception if any objects in the table overwrite any other objects. If False, will log a warning but will reindex all non-patch objects. 
 
 - :returns List of objects that were reindexed. 
 
- splitgraph.core.table.merge_index_data(current_index: Dict[str, Any], new_index: Dict[str, Any])
splitgraph.core.types module
- class splitgraph.core.types.Comparable
- Bases: - object
- class splitgraph.core.types.MountError(*, table_name: str, error: str, error_text: str)
- Bases: - pydantic.main.BaseModel- error: str
 - error_text: str
 - table_name: str
 
- class splitgraph.core.types.TableColumn(ordinal, name, pg_type, is_pk, comment)
- Bases: - tuple- comment: Optional[str]
- Alias for field number 4 
 - is_pk: bool
- Alias for field number 3 
 - name: str
- Alias for field number 1 
 - ordinal: int
- Alias for field number 0 
 - pg_type: str
- Alias for field number 2 
 
- splitgraph.core.types.dict_to_table_schema_params(tables: Dict[str, ExternalTableRequest]) Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]
- splitgraph.core.types.get_table_list(table_info: Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]) List[str]
- splitgraph.core.types.get_table_params(table_info: Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]], table_name: str) TableParams
- splitgraph.core.types.table_schema_params_to_dict(tables: Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]) Dict[str, Dict[str, Dict[str, Any]]]
- splitgraph.core.types.unwrap(result: Dict[str, Union[splitgraph.core.types.MountError, splitgraph.core.types.T]]) Tuple[Dict[str, splitgraph.core.types.T], Dict[str, splitgraph.core.types.MountError]]
Module contents
Core Splitgraph functionality: versioning and sharing tables.
The main point of interaction with the Splitgraph API is a splitgraph.core.repository.Repository object representing a local or a remote Splitgraph repository. Repositories can be created using one of the following methods:
Directly by invoking Repository(namespace, name, engine) where engine is the engine that the repository belongs to (that can be gotten with get_engine(engine_name). If the created repository doesn’t actually exist on the engine, it must first be initialized with repository.init().
By using
splitgraph.core.engine.lookup_repository()which will search for the repository on the current lookup path.