splitgraph.core package
Module contents¶
Core Splitgraph functionality: versioning and sharing tables.
The main point of interaction with the Splitgraph API is a splitgraph.core.repository.Repository
object representing a local or a remote Splitgraph repository. Repositories can be created using one of the following methods:
Directly by invoking Repository(namespace, name, engine) where engine is the engine that the repository belongs to (that can be gotten with get_engine(engine_name). If the created repository doesn’t actually exist on the engine, it must first be initialized with repository.init().
By using
splitgraph.core.engine.lookup_repository()
which will search for the repository on the current lookup path.
Submodules¶
splitgraph.core.engine module¶
Routines for managing Splitgraph engines, including looking up repositories and managing objects.
splitgraph.core.engine.
get_current_repositories
(engine: PostgresEngine) → List[Tuple[Repository, Optional[Image]]]¶Lists all repositories currently in the engine.
- Parameters
engine – Engine
- Returns
List of (Repository object, current HEAD image)
splitgraph.core.engine.
init_engine
(skip_object_handling: bool = False) → None¶Initializes the engine by:
performing any required engine-custom initialization
creating the metadata tables
- Parameters
skip_object_handling – If True, skips installing routines related to object handling and checkouts (like audit triggers and CStore management).
splitgraph.core.engine.
lookup_repository
(name: str, include_local: bool = False) → Repository¶Queries the SG engines on the lookup path to locate one hosting the given repository.
- Parameters
name – Repository name
include_local – If True, also queries the local engine
- Returns
Local or remote Repository object
splitgraph.core.engine.
repository_exists
(repository: Repository) → bool¶Checks if a repository exists on the engine.
- Parameters
repository – Repository object
splitgraph.core.image module¶
Image representation and provenance
- class
splitgraph.core.image.
Image
(image_hash: str, parent_id: Optional[str], created: datetime.datetime, comment: str, provenance_data: List[Dict[str, Union[str, List[str], List[bool], List[Dict[str, str]]]]], repository: Repository)¶ Bases:
tuple
Represents a Splitgraph image. Should’t be created directly, use Image-loading methods in the
splitgraph.core.repository.Repository
class instead.checkout
(force: bool = False, layered: bool = False) → None¶Checks the image out, changing the current HEAD pointer. Raises an error if there are pending changes to its checkout.
- Parameters
force – Discards all pending changes to the schema.
layered – If True, uses layered querying to check out the image (doesn’t materialize tables inside of it).
- property
comment
¶ Alias for field number 3
- property
created
¶ Alias for field number 2
delete_tag
(tag: str) → None¶Deletes a tag from an image.
- Parameters
tag – Tag to delete.
- property
engine
¶
get_log
() → List[splitgraph.core.image.Image]¶Repeatedly gets the parent of a given image until it reaches the bottom.
get_parent_children
() → Tuple[Optional[str], List[str]]¶Gets the parent and a list of children of a given image.
get_size
() → int¶Get the physical size used by the image’s objects (including those that might be shared with other images).
This is calculated from the metadata, the on-disk footprint might be smaller if not all of image’s objects have been downloaded.
- Returns
Size of the image in bytes.
get_table
(table_name: str) → splitgraph.core.table.Table¶Returns a Table object representing a version of a given table. Contains a list of objects that the table is linked to and the table’s schema.
- Parameters
table_name – Name of the table
- Returns
Table object
get_tables
() → List[str]¶Gets the names of all tables inside of an image.
Lists all tags that this image has.
- property
image_hash
¶ Alias for field number 0
- property
object_engine
¶
- property
parent_id
¶ Alias for field number 1
provenance
(reverse=False, engine=None) → List[Tuple[Repository, str]]¶Inspects the image’s parent chain to come up with a set of repositories and their hashes that it was created from.
If reverse is True, returns a list of images that were created _from_ this image. If this image is on a remote repository, engine can be passed in to override the engine used for the lookup of dependents.
- Returns
List of (repository, image_hash)
- property
provenance_data
¶ Alias for field number 4
query_schema
(wrapper: Optional[str] = 'splitgraph.core.fdw_checkout.QueryingForeignDataWrapper') → Iterator[str]¶Creates a temporary schema with tables in this image mounted as foreign tables that can be accessed via read-only layered querying. On exit from the context manager, the schema is discarded.
- Returns
The name of the schema the image is located in.
- property
repository
¶ Alias for field number 5
set_provenance
(provenance_data: List[Dict[str, Union[str, List[str], List[bool], List[Dict[str, str]]]]]) → None¶Sets the image’s provenance. Internal function called by the Splitfile interpreter, shouldn’t be called directly as it changes the image after it’s been created.
- Parameters
provenance_data – List of parsed Splitfile commands and their data.
tag
(tag: str) → None¶Tags a given image. All tags are unique inside of a repository. If a tag already exists, it’s removed from the previous image and given to the new image.
- Parameters
tag – Tag to set. ‘latest’ and ‘HEAD’ are reserved tags.
to_splitfile
(ignore_irreproducible: bool = False, source_replacement: Optional[Dict[Repository, str]] = None) → List[str]¶Recreate the Splitfile that can be used to reconstruct this image.
- Parameters
ignore_irreproducible – If True, ignore commands from irreproducible Splitfile lines (like MOUNT or custom commands) and instead emit a comment (this results in an invalid Splitfile).
source_replacement – A dictionary of repositories and image hashes/tags specifying how to replace the dependencies of this Splitfile (table imports and FROM commands).
- Returns
A list of Splitfile commands that can be fed back into the executor.
splitgraph.core.image.
getrandbits
(k) → x. Generates an int with k random bits.¶
splitgraph.core.image.
reconstruct_splitfile
(provenance_data: List[Dict[str, Union[str, List[str], List[bool], List[Dict[str, str]]]]], ignore_irreproducible: bool = False, source_replacement: Optional[Dict[Repository, str]] = None) → List[str]¶Recreate the Splitfile that can be used to reconstruct an image.
splitgraph.core.object_manager module¶
Functions related to creating, deleting and keeping track of physical Splitgraph objects.
- class
splitgraph.core.object_manager.
ObjectManager
(object_engine: PostgresEngine, metadata_engine: Optional[PostgresEngine] = None)¶ Bases:
splitgraph.core.fragment_manager.FragmentManager
Brings the multiple manager classes together and manages the object cache (downloading and uploading objects as required in order to fulfill certain queries)
cleanup
() → List[str]¶Deletes all objects in the object_tree not required by any current repository, including their dependencies and their remote locations. Also deletes all objects not registered in the object_tree.
download_objects
(source: Optional[ObjectManager], objects_to_fetch: List[str], object_locations: List[Tuple[str, str, str]]) → List[str]¶Fetches the required objects from the remote and stores them locally. Does nothing for objects that already exist.
- Parameters
source – Remote ObjectManager. If None, will only try to download objects from the external location.
objects_to_fetch – List of object IDs to download.
object_locations – List of custom object locations, encoded as tuples (object_id, object_url, protocol).
ensure_objects
(table: Optional[Table], objects: Optional[List[str]] = None, quals: Optional[Sequence[Sequence[Tuple[str, str, Any]]]] = None, defer_release: bool = False, tracer: Optional[splitgraph.core.common.Tracer] = None, upstream_manager: Optional[ObjectManager] = None) → Iterator[Union[List[str], Tuple[List[str], splitgraph.core.common.CallbackList]]]¶Resolves the objects needed to materialize a given table and makes sure they are in the local splitgraph_meta schema.
Whilst inside this manager, the objects are guaranteed to exist. On exit from it, the objects are marked as unneeded and can be garbage collected.
- Parameters
table – Table to materialize
objects – List of objects to download: one of table or objects must be specified.
quals – Optional list of qualifiers to be passed to the fragment engine. Fragments that definitely do not match these qualifiers will be dropped. See the docstring for filter_fragments for the format.
defer_release – If True, won’t release the objects on exit.
- Returns
If defer_release is True: List of table fragments and a callback that the caller must call when the objects are no longer needed. If defer_release is False: just the list of table fragments.
get_cache_occupancy
() → int¶- Returns
Space occupied by objects cached from external locations, in bytes.
get_downloaded_objects
(limit_to: Optional[List[str]] = None) → List[str]¶Gets a list of objects currently in the Splitgraph cache (i.e. not only existing externally.)
- Parameters
limit_to – If specified, only the objects in this list will be returned.
- Returns
Set of object IDs.
get_total_object_size
()¶- Returns
Space occupied by all objects on the engine, in bytes.
make_objects_external
(objects: List[str], handler: str, handler_params: Dict[Any, Any]) → None¶Uploads local objects to an external location and marks them as being cached locally (thus making it possible to evict or swap them out).
- Parameters
objects – Object IDs to upload. Will do nothing for objects that already exist externally.
handler – Object handler
handler_params – Extra handler parameters
run_eviction
(keep_objects: List[str], required_space: Optional[int] = None) → None¶Delete enough objects with zero reference count (only those, since we guarantee that whilst refcount is >0, the object stays alive) to free at least required_space in the cache.
- Parameters
keep_objects – List of objects (besides those with nonzero refcount) that can’t be deleted.
required_space – Space, in bytes, to free. If the routine can’t free at least this much space, it shall raise an exception. If None, removes all eligible objects.
upload_objects
(target: splitgraph.core.object_manager.ObjectManager, objects_to_push: List[str], handler: str = 'DB', handler_params: Optional[Dict[Any, Any]] = None) → Sequence[Tuple[str, Optional[str]]]¶Uploads physical objects to the remote or some other external location.
- Parameters
target – Target ObjectManager
objects_to_push – List of object IDs to upload.
handler – Name of the handler to use to upload objects. Use DB to push them to the remote, FILEto store them in a directory that can be accessed from the client and HTTP to upload them to HTTP.
handler_params – For HTTP, a dictionary {“username”: username, “password”, password}. For FILE, a dictionary {“path”: path} specifying the directory where the objects shall be saved.
- Returns
A list of (object_id, url) that specifies all objects were uploaded (skipping objects that already exist on the remote).
splitgraph.core.registry module¶
Functions for communicating with the remote Splitgraph catalog
splitgraph.core.registry.
get_info_key
(engine: PostgresEngine, key: str) → Optional[str]¶Gets a configuration key from the remote registry, used to notify the client of the registry’s capabilities.
- Parameters
engine – Engine
key – Key to get
splitgraph.core.registry.
set_info_key
(engine: PostgresEngine, key: str, value: Union[bool, str]) → None¶Sets a configuration value on the remote registry.
- Parameters
engine – Engine
key – Key to set
value – New value for the key
splitgraph.core.registry.
setup_registry_mode
(engine: PostgresEngine) → None¶Set up access policies/RLS:
Normal users aren’t allowed to create tables/schemata (can’t do checkouts inside of a registry or upload SG objects directly to it)
Normal users can’t access the splitgraph_meta schema directly: they’re only supposed to be able to talk to it via stored procedures in splitgraph_api. Those procedures are set up with SECURITY INVOKER (run with those users’ credentials) and what they can access is further restricted by RLS:
images/tables/tags meta tables: can only create/update/delete records where the namespace = user ID
objects/object_location tables: same. An object (piece of data) becomes owned by the user that creates it and still remains so even if someone else’s image starts using it. Hence, the original owner can delete or change it (since they control the external location they’ve uploaded it to anyway).
splitgraph.core.repository module¶
Public API for managing images in a Splitgraph repository.
- class
splitgraph.core.repository.
Repository
(namespace: str, repository: str, engine: Optional[splitgraph.engine.postgres.engine.PostgresEngine] = None, object_engine: Optional[splitgraph.engine.postgres.engine.PostgresEngine] = None, object_manager: Optional[splitgraph.core.object_manager.ObjectManager] = None)¶ Bases:
object
Splitgraph repository API
commit
(image_hash: Optional[str] = None, comment: Optional[str] = None, snap_only: bool = False, chunk_size: Optional[int] = None, split_changeset: bool = False, extra_indexes: Optional[Dict[str, Dict[str, Union[List[str], Dict[str, Dict[str, Any]]]]]] = None, in_fragment_order: Optional[Dict[str, List[str]]] = None, overwrite: bool = False) → splitgraph.core.image.Image¶Commits all pending changes to a given repository, creating a new image.
- Parameters
image_hash – Hash of the commit. Chosen by random if unspecified.
comment – Optional comment to add to the commit.
snap_only – If True, will store the table as a full snapshot instead of delta compression
chunk_size – For tables that are stored as snapshots (new tables and where snap_only has been passed, the table will be split into fragments of this many rows.
split_changeset – If True, splits the changeset into multiple fragments based on the PK regions spanned by the current table fragments. For example, if the original table consists of 2 fragments, first spanning rows 1-10000, second spanning rows 10001-20000 and the change alters rows 1, 10001 and inserts a row with PK 20001, this will record the change as 3 fragments: one inheriting from the first original fragment, one inheriting from the second and a brand new fragment. This increases the number of fragments in total but means that fewer rows will need to be scanned to satisfy a query. If False, the changeset will be stored as a single fragment inheriting from the last fragment in the table.
extra_indexes – Dictionary of {table: index_type: column: index_specific_kwargs}.
in_fragment_order – Dictionary of {table: list of columns}. If specified, will
sort the data inside each chunk by this/these key(s) for each table. :param overwrite: If an object already exists, will force recreate it.
- Returns
The newly created Image object.
commit_engines
() → None¶Commit the underlying transactions on both engines that the repository uses.
delete
(unregister: bool = True, uncheckout: bool = True) → None¶Discards all changes to a given repository and optionally all of its history, as well as deleting the Postgres schema that it might be checked out into. Doesn’t delete any cached physical objects.
After performing this operation, this object becomes invalid and must be discarded, unless init() is called again.
- Parameters
unregister – Whether to purge repository history/metadata
uncheckout – Whether to delete the actual checked out repo. This has no effect if the repository is backed by a registry (rather than a local engine).
diff
(table_name: str, image_1: Union[splitgraph.core.image.Image, str], image_2: Optional[Union[splitgraph.core.image.Image, str]], aggregate: bool = False) → Optional[Union[bool, Tuple[int, int, int], List[Tuple[bool, Tuple]]]]¶Compares the state of a table in different images by materializing both tables into a temporary space and comparing them row-to-row.
- Parameters
table_name – Name of the table.
image_1 – First image hash / object. If None, uses the state of the current staging area.
image_2 – Second image hash / object. If None, uses the state of the current staging area.
aggregate – If True, returns a tuple of integers denoting added, removed and updated rows between the two images.
- Returns
If the table doesn’t exist in one of the images, returns True if it was added and False if it was removed. If aggregate is True, returns the aggregation of changes as specified before. Otherwise, returns a list of changes where each change is a tuple of(True for added, False for removed, row contents).
dump
(stream: _io.TextIOWrapper, exclude_object_contents: bool = False) → None¶Creates an SQL dump with the metadata required for the repository and all of its objects.
- Parameters
stream – Stream to dump the data into.
exclude_object_contents – Only dump the metadata but not the actual object contents.
- classmethod
from_schema
(schema: str) → splitgraph.core.repository.Repository¶ Convert a Postgres schema name of the format namespace/repository to a Splitgraph repository object.
- classmethod
from_template
(template: splitgraph.core.repository.Repository, namespace: Optional[str] = None, repository: Optional[str] = None, engine: Optional[splitgraph.engine.postgres.engine.PostgresEngine] = None, object_engine: Optional[splitgraph.engine.postgres.engine.PostgresEngine] = None) → splitgraph.core.repository.Repository¶ Create a Repository from an existing one replacing some of its attributes.
Gets all tagged images and their hashes in a given repository.
- Returns
List of (image_hash, tag)
get_local_size
() → int¶Get the actual size used by this repository’s downloaded objects.
This might still be double-counted if the repository shares objects with other repositores.
- Returns
Size of the repository in bytes.
get_size
() → int¶Get the physical size used by the repository’s data, counting objects that are used by multiple images only once. This is calculated from the metadata, the on-disk footprint might be smaller if not all of repository’s objects have been downloaded.
- Returns
Size of the repository in bytes.
has_pending_changes
() → bool¶Detects if the repository has any pending changes (schema changes, table additions/deletions, content changes).
- property
head
¶ Return the HEAD image for the repository or None if the repository isn’t checked out.
- property
head_strict
¶ Return the HEAD image for the repository. Raise an exception if the repository isn’t checked out.
import_tables
(tables: Sequence[str], source_repository: splitgraph.core.repository.Repository, source_tables: Sequence[str], image_hash: Optional[str] = None, foreign_tables: bool = False, do_checkout: bool = True, target_hash: Optional[str] = None, table_queries: Optional[Sequence[bool]] = None, parent_hash: Optional[str] = None, wrapper: Optional[str] = 'splitgraph.core.fdw_checkout.QueryingForeignDataWrapper', skip_validation: bool = False) → str¶Creates a new commit in target_repository with one or more tables linked to already-existing tables. After this operation, the HEAD of the target repository moves to the new commit and the new tables are materialized.
- Parameters
tables – If not empty, must be the list of the same length as source_tables specifying names to store them under in the target repository.
source_repository – Repository to import tables from.
source_tables – List of tables to import. If empty, imports all tables.
image_hash – Image hash in the source repository to import tables from. Uses the current source HEAD by default.
foreign_tables – If True, copies all source tables to create a series of new snapshots instead of treating them as Splitgraph-versioned tables. This is useful for adding brand new tables (for example, from an FDW-mounted table).
do_checkout – If False, doesn’t check out the newly created image.
target_hash – Hash of the new image that tables is recorded under. If None, gets chosen at random.
table_queries – If not [], it’s treated as a Boolean mask showing which entries in the tables list are instead SELECT SQL queries that form the target table. The queries have to be non-schema qualified and work only against tables in the source repository. Each target table created is the result of the respective SQL query. This is committed as a new snapshot.
parent_hash – If not None, must be the hash of the image to base the new image on. Existing tables from the parent image are preserved in the new image. If None, the current repository HEAD is used.
wrapper – Override the default class for the layered querying foreign data wrapper.
skip_validation – Don’t validate SQL used in import statements (used by the Splitfile executor that pre-formats the SQL).
- Returns
Hash that the new image was stored under.
init
() → None¶Initializes an empty repo with an initial commit (hash 0000…)
materialized_table
(table_name: str, image_hash: Optional[str]) → Iterator[Tuple[str, str]]¶A context manager that returns a pointer to a read-only materialized table in a given image. The table is deleted on exit from the context manager.
- Parameters
table_name – Name of the table
image_hash – Image hash to materialize
- Returns
(schema, table_name) where the materialized table is located.
pull
(download_all: Optional[bool] = False, overwrite_objects: bool = False, overwrite_tags: bool = False, single_image: Optional[str] = None) → None¶Synchronizes the state of the local Splitgraph repository with its upstream, optionally downloading all new objects created on the remote.
- Parameters
download_all – If True, downloads all objects and stores them locally. Otherwise, will only download required objects when a table is checked out.
overwrite_objects – If True, will overwrite object metadata on the local repository for existing objects.
overwrite_tags – If True, will overwrite existing tags.
single_image – Limit the download to a single image hash/tag.
push
(remote_repository: Optional[Repository] = None, overwrite_objects: bool = False, reupload_objects: bool = False, overwrite_tags: bool = False, handler: str = 'DB', handler_options: Optional[Dict[str, Any]] = None, single_image: Optional[str] = None) → splitgraph.core.repository.Repository¶Inverse of
pull
: Pushes all local changes to the remote and uploads new objects.- Parameters
remote_repository – Remote repository to push changes to. If not specified, the current upstream is used.
handler – Name of the handler to use to upload objects. Use DB to push them to the remote or S3to store them in an S3 bucket.
overwrite_objects – If True, will overwrite object metadata on the remote repository for existing objects.
reupload_objects – If True, will reupload objects for which metadata is uploaded.
overwrite_tags – If True, will overwrite existing tags on the remote repository.
handler_options – Extra options to pass to the handler. For example, see
splitgraph.hooks.s3.S3ExternalObjectHandler
.single_image – Limit the upload to a single image hash/tag.
rollback_engines
() → None¶Rollback the underlying transactions on both engines that the repository uses.
run_sql
(sql: Union[psycopg2.sql.Composed, str], arguments: Optional[Any] = None, return_shape: splitgraph.engine.ResultShape = <ResultShape.MANY_MANY: 4>) → Any¶Execute an arbitrary SQL statement inside of this repository’s checked out schema.
Sets tags for multiple images.
- Parameters
tags – List of (image_hash, tag)
to_schema
() → str¶Returns the engine schema that this repository gets checked out into.
uncheckout
(force: bool = False) → None¶Deletes the schema that the repository is checked out into
- Parameters
force – Discards all pending changes to the schema.
- property
upstream
¶ The remote upstream repository that this local repository tracks.
splitgraph.core.repository.
clone
(remote_repository: Union[Repository, str], local_repository: Optional[Repository] = None, overwrite_objects: bool = False, overwrite_tags: bool = False, download_all: Optional[bool] = False, single_image: Optional[str] = None) → splitgraph.core.repository.Repository¶Clones a remote Splitgraph repository or synchronizes remote changes with the local ones.
If the target repository has no set upstream engine, the source repository becomes its upstream.
- Parameters
remote_repository – Remote Repository object to clone or the repository’s name. If a name is passed, the repository will be looked up on the current lookup path in order to find the engine the repository belongs to.
local_repository – Local repository to clone into. If None, uses the same name as the remote.
download_all – If True, downloads all objects and stores them locally. Otherwise, will only download required objects when a table is checked out.
overwrite_objects – If True, will overwrite object metadata on the local repository for existing objects.
overwrite_tags – If True, will overwrite existing tags.
single_image – If set, only get a single image with this hash/tag from the source.
- Returns
A locally cloned Repository object.
splitgraph.core.repository.
getrandbits
(k) → x. Generates an int with k random bits.¶
splitgraph.core.repository.
import_table_from_remote
(remote_repository: splitgraph.core.repository.Repository, remote_tables: List[str], remote_image_hash: str, target_repository: splitgraph.core.repository.Repository, target_tables: List[Any], target_hash: str = None) → None¶Shorthand for importing one or more tables from a yet-uncloned remote. Here, the remote image hash is required, as otherwise we aren’t necessarily able to determine what the remote head is.
- Parameters
remote_repository – Remote Repository object
remote_tables – List of remote tables to import
remote_image_hash – Image hash to import the tables from
target_repository – Target repository to import the tables to
target_tables – Target table aliases
target_hash – Hash of the image that’s created with the import. Default random.
splitgraph.core.repository.
table_exists_at
(repository: splitgraph.core.repository.Repository, table_name: str, image: Optional[splitgraph.core.image.Image] = None) → bool¶Determines whether a given table exists in a Splitgraph image without checking it out. If image_hash is None, determines whether the table exists in the current staging area.
splitgraph.core.table module¶
Table metadata-related classes.
- class
splitgraph.core.table.
QueryPlan
(table: splitgraph.core.table.Table, quals: Optional[Sequence[Sequence[Tuple[str, str, Any]]]], columns: Sequence[str])¶ Bases:
object
Represents the initial query plan (fragments to query) for given columns and qualifiers.
- class
splitgraph.core.table.
Table
(repository: Repository, image: Image, table_name: str, table_schema: List[splitgraph.core.types.TableColumn], objects: List[str])¶ Bases:
object
Represents a Splitgraph table in a given image. Shouldn’t be created directly, use Table-loading methods in the
splitgraph.core.image.Image
class instead.get_length
() → int¶Get the number of rows in this table.
This might be smaller than the total number of rows in all objects belonging to this table as some objects might overwrite each other.
- Returns
Number of rows in table
get_query_plan
(quals: Optional[Sequence[Sequence[Tuple[str, str, Any]]]], columns: Sequence[str], use_cache: bool = True) → splitgraph.core.table.QueryPlan¶Start planning a query (preliminary steps before object downloading, like qualifier filtering).
- Parameters
quals – Qualifiers in CNF form
columns – List of columns
use_cache – If True, will fetch the plan from the cache for the same qualifiers and columns.
- Returns
QueryPlan
get_size
() → int¶Get the physical size used by the table’s objects (including those shared with other tables).
This is calculated from the metadata, the on-disk footprint might be smaller if not all of table’s objects have been downloaded.
- Returns
Size of the table in bytes.
materialize
(destination: str, destination_schema: Optional[str] = None, lq_server: Optional[str] = None) → None¶Materializes a Splitgraph table in the target schema as a normal Postgres table, potentially downloading all required objects and using them to reconstruct the table.
- Parameters
destination – Name of the destination table.
destination_schema – Name of the destination schema.
lq_server – If set, sets up a layered querying FDW for the table instead using this foreign server.
query
(columns: List[str], quals: Sequence[Sequence[Tuple[str, str, Any]]])¶Run a read-only query against this table without materializing it.
This is a wrapper around query_lazy() that force evaluates the results which might mean more fragments being materialized that aren’t needed.
- Parameters
columns – List of columns from this table to fetch
quals – List of qualifiers in conjunctive normal form. See the documentation for FragmentManager.filter_fragments for the actual format.
- Returns
List of dictionaries of results
query_indirect
(columns: List[str], quals: Optional[Sequence[Sequence[Tuple[str, str, Any]]]]) → Tuple[Iterator[bytes], Callable, splitgraph.core.table.QueryPlan]¶Run a read-only query against this table without materializing it. Instead of actual results, this returns a generator of SQL queries that the caller can use to get the results as well as a callback that the caller has to run after they’re done consuming the results.
In particular, the query generator will prefer returning direct queries to Splitgraph objects and only when those are exhausted will it start materializing delta-compressed fragments.
This is an advanced method: you probably want to call table.query().
- Parameters
columns – List of columns from this table to fetch
quals – List of qualifiers in conjunctive normal form. See the documentation for FragmentManager.filter_fragments for the actual format.
- Returns
Generator of queries (bytes), a callback and a query plan object (containing stats that are fully populated after the callback has been called to end the query).
query_lazy
(columns: List[str], quals: Sequence[Sequence[Tuple[str, str, Any]]]) → Iterator[Iterator[Dict[str, Any]]]¶Run a read-only query against this table without materializing it.
- Parameters
columns – List of columns from this table to fetch
quals – List of qualifiers in conjunctive normal form. See the documentation for FragmentManager.filter_fragments for the actual format.
- Returns
Generator of dictionaries of results.
reindex
(extra_indexes: Dict[str, Union[List[str], Dict[str, Dict[str, Any]]]], raise_on_patch_objects=True) → List[str]¶Run extra indexes on all objects in this table and update their metadata. This only works on objects that don’t have any deletions or upserts (have a deletion hash of 000000…).
- Parameters
extra_indexes – Dictionary of {index_type: column: index_specific_kwargs}.
raise_on_patch_objects – If True, will raise an exception if any objects in the table overwrite any other objects. If False, will log a warning but will reindex all non-patch objects.
:returns List of objects that were reindexed.
splitgraph.core.table.
create_foreign_table
(schema: str, server: str, table_name: str, schema_spec: List[splitgraph.core.types.TableColumn], internal_table_name: Optional[str] = None, extra_options: Optional[Dict[str, str]] = None)¶
splitgraph.core.table.
merge_index_data
(current_index: Dict[str, Any], new_index: Dict[str, Any])¶