splitgraph.hooks package
Module contents¶
Various hooks for extending Splitgraph, including:
External object handlers (
splitgraph.hooks.external_objects
) allowing to download/upload objects to locations other than the remote Splitgraph engine.FDW handlers (
splitgraph.hooks.mount_handlers
) that allow to use the Postgres engine’s FDW interface to mount other external databases on the engine.Splitfile commands (
splitgraph.hooks.splitfile_commands
) to define custom data transformation steps compatible with the Splitfile framework.
Submodules¶
splitgraph.hooks.external_objects module¶
Hooks for registering handlers to upload/download objects from external locations into Splitgraph’s cache.
- class
splitgraph.hooks.external_objects.
ExternalObjectHandler
(params: Dict[Any, Any])¶ Bases:
object
Framework for allowing to dump objects from the Splitgraph cache to an external location. This allows the objects to be stored somewhere other than the actual remote engine.
External object handlers must extend this class and be registered in the Splitgraph config.
For an example of how this can be used, see splitgraph.hooks.s3: it’s a handler allowing objects to be uploaded to S3/S3-compatible host using the Minio API. It’s registered in the config as follows:
[external_handlers]S3=splitgraph.hooks.s3.S3ExternalObjectHandler
The protocol and the URLs returned by this handler are stored in splitgraph_meta.external_objects and used to download the objects back into the Splitgraph cache when they are needed.
download_objects
(objects: List[Tuple[str, str]], remote_engine: PsycopgEngine) → Sequence[str]¶Download objects from the external location into the Splitgraph cache.
- Parameters
objects – List of tuples (object_id, object_url) that this handler had previosly uploaded the objects to.
remote_engine – An instance of Engine class that the objects will be registered on
- Returns
A list of object IDs that have been successfully downloaded.
upload_objects
(objects: List[str], remote_engine: PsycopgEngine) → Sequence[Tuple[str, str]]¶Upload objects from the Splitgraph cache to an external location
- Parameters
objects – List of object IDs to upload
remote_engine – An instance of Engine class that the objects will be registered on
- Returns
A list of successfully uploaded object IDs and URLs they can be found at.
splitgraph.hooks.external_objects.
get_external_object_handler
(name: str, handler_params: Dict[Any, Any]) → splitgraph.hooks.external_objects.ExternalObjectHandler¶Load an external protocol handler by its name, initializing it with optional parameters.
splitgraph.hooks.external_objects.
register_upload_download_handler
(name: str, handler_class: Callable[[…], splitgraph.hooks.external_objects.ExternalObjectHandler]) → None¶Register an external protocol handler. See the docstring for get_upload_download_handler for the required signatures of the handler functions.
splitgraph.hooks.mount_handlers module¶
Hooks for additional handlers used to mount other databases via FDW. These handlers become available in the command line tool (via sgr mount) and in the Splitfile interpreter (via FROM MOUNT).
splitgraph.hooks.mount_handlers.
get_mount_handler
(mount_handler: str) → Callable¶Returns a mount function for a given handler. The mount function must have a signature (mountpoint, server, port, username, password, handler_kwargs).
splitgraph.hooks.mount_handlers.
get_mount_handlers
() → List[str]¶Returns the names of all registered mount handlers.
splitgraph.hooks.mount_handlers.
init_fdw
(engine: PostgresEngine, server_id: str, wrapper: str, server_options: Optional[Dict[str, Optional[str]]] = None, user_options: Optional[Dict[str, str]] = None, overwrite: bool = True) → None¶Sets up a foreign data server on the engine.
- Parameters
engine – PostgresEngine
server_id – Name to call the foreign server, must be unique. Will be deleted if exists.
wrapper – Name of the foreign data wrapper (must be installed as an extension on the engine)
server_options – Dictionary of FDW options
user_options – Dictionary of user options
overwrite – If the server already exists, delete and recreate it.
splitgraph.hooks.mount_handlers.
mount
(mountpoint: str, mount_handler: str, handler_kwargs: Dict[str, Union[str, int, None, List[str], Dict[str, Union[Dict[str, str], str]]]]) → None¶Mounts a foreign database via Postgres FDW (without creating new Splitgraph objects)
- Parameters
mountpoint – Mountpoint to import the new tables into.
mount_handler – The type of the mounted database. Must be one of postgres_fdw or mongo_fdw.
handler_kwargs – Dictionary of options to pass to the mount handler.
splitgraph.hooks.mount_handlers.
mount_mongo
(mountpoint: str, server: str, port: int, username: str, password: str, **table_spec) → None¶Mount a Mongo database.
Mounts one or more collections on a remote Mongo database as a set of foreign tables locally.
- Parameters
mountpoint – Schema to mount the remote into.
server – Database hostname.
port – Port the Mongo server is running on.
username – A read-only user that the database will be accessed as.
password – Password for the read-only user.
table_spec – A dictionary of form {“table_name”: {“db”: <dbname>, “coll”: <collection>, “schema”: {“col1”: “type1”…}}}.
splitgraph.hooks.mount_handlers.
mount_mysql
(mountpoint: str, server: str, port: int, username: str, password: str, remote_schema: str, tables: List[str] = None) → None¶Mount a MySQL database.
Mounts a schema on a remote MySQL database as a set of foreign tables locally.
- Parameters
mountpoint – Schema to mount the remote into.
server – Database hostname.
port – Database port
username – A read-only user that the database will be accessed as.
password – Password for the read-only user.
remote_schema – Remote schema name.
tables – Tables to mount (default all).
splitgraph.hooks.mount_handlers.
mount_postgres
(mountpoint: str, server: str, port: Union[int, str], username: str, password: str, dbname: str, remote_schema: str, tables: Optional[List[str]] = None) → None¶Mount a Postgres database.
Mounts a schema on a remote Postgres database as a set of foreign tables locally.
- Parameters
mountpoint – Schema to mount the remote into.
server – Database hostname.
port – Port the Postgres server is running on.
username – A read-only user that the database will be accessed as.
password – Password for the read-only user.
dbname – Remote database name.
remote_schema – Remote schema name.
tables – Tables to mount (default all).
splitgraph.hooks.mount_handlers.
register_mount_handler
(name: str, mount_function: Callable) → None¶Returns a mount function under a given name. See get_mount_handler for the mount handler spec.
splitgraph.hooks.s3 module¶
Plugin for uploading Splitgraph objects from the cache to an external S3-like object store
- class
splitgraph.hooks.s3.
S3ExternalObjectHandler
(params: Dict[Any, Any])¶ Bases:
splitgraph.hooks.external_objects.ExternalObjectHandler
Uploads/downloads the objects to/from S3/S3-compatible host using the Minio client.
The handler is “attached” to a given registry which manages issuing pre-signed GET/PUT URLs.
The handler supports a parameter threads specifying the number of threads used to upload the objects.
download_objects
(objects: List[Tuple[str, str]], remote_engine: PsycopgEngine) → List[str]¶Download objects from Minio.
- Parameters
objects – List of (object ID, object URL (object ID it’s stored under))
upload_objects
(objects: List[str], remote_engine: PsycopgEngine) → List[Tuple[str, str]]¶Upload objects to Minio
- Parameters
remote_engine – Remote Engine class
objects – List of object IDs to upload
- Returns
List of tuples with successfully uploaded objects and their URLs.
splitgraph.hooks.s3.
get_object_download_urls
(remote_engine, remote_object_ids)¶
splitgraph.hooks.s3.
get_object_upload_urls
(remote_engine, objects)¶
splitgraph.hooks.splitfile_commands module¶
A framework for custom Splitfile commands. The execution flow is as follows:
When the Splitfile executor finds an unknown command, it looks for an entry in the config file:
[commands]RUN=splitgraph.plugins.RunThe command class must extend this class, initialized at every invocation time.
The command’s calc_hash() method is run. The resultant command context hash is combined with the current image hash to produce the new image hash: if it already exists, then the image is simply checked out.
Otherwise (or if calc_hash is undefined or returns None), execute(), where the actual command should be implemented, is run. If it returns a hash, this hash is used for the new image. If this hash already exists, the existing image is checked out instead. If the command returns None, a random hash is generated for the new image.
- class
splitgraph.hooks.splitfile_commands.
PluginCommand
¶ Bases:
object
Base class for custom Splitfile commands.
calc_hash
(repository, args)¶Calculates the command context hash for this custom command. If either the command context hash or the previous image hash has changed, then the image hash produced by this command will change. Consequently, two commands with the same command context hashes are assumed to have the same effect on any Splitgraph images.
This is supposed to be a lightweight method intended for pre-flight image hash calculations (without performing the actual transformation). If it returns None, the actual transformation is run anyway.
For example, for a command that imports some data from an external URL, this could be the hash of the last modified timestamp provided by the external data vendor. If the timestamp is unchanged, the data is unchanged and so actual command won’t be re-executed.
- Parameters
repository – SG Repository object pointed to a schema with the checked out image the command is being run against.
args – Positional arguments to the command
- Returns
Command context hash (a string of 64 hexadecimal digits)
execute
(repository, args)¶Execute the custom command against the target schema, optionally returning the new image hash. The contract for the command is as follows (though it is not currently enforced by the runtime):
Has to use get_engine().run_sql (or run_sql_batch) to interact with the engine.
Can only write to the schema with the checked-out repository (run_sql runs non-schema-qualified statements against the correct schema).
Can inspect splitgraph_meta (e.g. to find the current HEAD) for the repository.
Can’t alter the versioning of the repository.
- Parameters
repository – SG Repository object pointed to a schema with the checked out image the command is being run against.
args – Positional arguments to the command
- Returns
Command context hash (a string of 64 hexadecimal digits). If calc_hash() had previously returned a hash, this hash is ignored. If both this command and calc_hash() return None, the hash is randomly generated.