splitgraph.hooks package

Module contents

Various hooks for extending Splitgraph, including:

  • External object handlers (splitgraph.hooks.external_objects) allowing to download/upload objects to locations other than the remote Splitgraph engine.

  • Data sources (splitgraph.hooks.data_sources) that allow to add data to Splitgraph, e.g.

using the Postgres engine’s FDW interface to mount other external databases on the engine.

Submodules

splitgraph.hooks.external_objects module

Hooks for registering handlers to upload/download objects from external locations into Splitgraph’s cache.

class splitgraph.hooks.external_objects.ExternalObjectHandler(params: Dict[Any, Any])

Bases: object

Framework for allowing to dump objects from the Splitgraph cache to an external location. This allows the objects to be stored somewhere other than the actual remote engine.

External object handlers must extend this class and be registered in the Splitgraph config.

For an example of how this can be used, see splitgraph.hooks.s3: it’s a handler allowing objects to be uploaded to S3/S3-compatible host using the Minio API. It’s registered in the config as follows:

[external_handlers]S3=splitgraph.hooks.s3.S3ExternalObjectHandler

The protocol and the URLs returned by this handler are stored in splitgraph_meta.external_objects and used to download the objects back into the Splitgraph cache when they are needed.

download_objects(objects: List[Tuple[str, str]], remote_engine: PsycopgEngine)Sequence[str]

Download objects from the external location into the Splitgraph cache.

Parameters
  • objects – List of tuples (object_id, object_url) that this handler had previosly uploaded the objects to.

  • remote_engine – An instance of Engine class that the objects will be registered on

Returns

A list of object IDs that have been successfully downloaded.

upload_objects(objects: List[str], remote_engine: PsycopgEngine)Sequence[Tuple[str, str]]

Upload objects from the Splitgraph cache to an external location

Parameters
  • objects – List of object IDs to upload

  • remote_engine – An instance of Engine class that the objects will be registered on

Returns

A list of successfully uploaded object IDs and URLs they can be found at.

splitgraph.hooks.external_objects.get_external_object_handler(name: str, handler_params: Dict[Any, Any])splitgraph.hooks.external_objects.ExternalObjectHandler

Load an external protocol handler by its name, initializing it with optional parameters.

splitgraph.hooks.external_objects.register_upload_download_handler(name: str, handler_class: Callable[[], splitgraph.hooks.external_objects.ExternalObjectHandler])None

Register an external protocol handler. See the docstring for get_upload_download_handler for the required signatures of the handler functions.

splitgraph.hooks.mount_handlers module

Extra wrapper code for mount handlers

splitgraph.hooks.mount_handlers.mount(mountpoint: str, mount_handler: str, handler_kwargs: Dict[str, Any], overwrite: bool = True, tables: Optional[TableInfo] = None)None

Mounts a foreign database via an FDW (without creating new Splitgraph objects)

Parameters
  • mountpoint – Mountpoint to import the new tables into.

  • mount_handler – The type of the mounted database.

  • handler_kwargs – Dictionary of options to pass to the mount handler.

  • overwrite – Delete the foreign server if it already exists. Used by mount_postgres for data pulls.

  • tables – List of tables to mount or their schemas

splitgraph.hooks.mount_handlers.mount_postgres(mountpoint, **kwargs)None

Mount a Postgres database.

Mounts a schema on a remote Postgres database as a set of foreign tables locally. 

Parameters
  • mountpoint – Schema to mount the remote into.

  • server – Database hostname.

  • port – Port the Postgres server is running on.

  • username – A read-only user that the database will be accessed as.

  • password – Password for the read-only user.

  • dbname – Remote database name.

  • remote_schema – Remote schema name.

  • extra_server_args – Dictionary of extra arguments to pass to the foreign server

  • tables – Tables to mount (default all). If a list, then will use IMPORT FOREIGN SCHEMA.

If a dictionary, must have the format {“table_name”: {“col_1”: “type_1”, …}}.

splitgraph.hooks.s3 module

Plugin for uploading Splitgraph objects from the cache to an external S3-like object store

class splitgraph.hooks.s3.S3ExternalObjectHandler(params: Dict[Any, Any])

Bases: splitgraph.hooks.external_objects.ExternalObjectHandler

Uploads/downloads the objects to/from S3/S3-compatible host using the Minio client.

The handler is “attached” to a given registry which manages issuing pre-signed GET/PUT URLs.

The handler supports a parameter threads specifying the number of threads used to upload the objects.

download_objects(objects: List[Tuple[str, str]], remote_engine: PsycopgEngine)List[str]

Download objects from Minio.

Parameters

objects – List of (object ID, object URL (object ID it’s stored under))

upload_objects(objects: List[str], remote_engine: PsycopgEngine)List[Tuple[str, str]]

Upload objects to Minio

Parameters
  • remote_engine – Remote Engine class

  • objects – List of object IDs to upload

Returns

List of tuples with successfully uploaded objects and their URLs.

splitgraph.hooks.s3.get_object_download_urls(remote_engine, remote_object_ids)
splitgraph.hooks.s3.get_object_upload_urls(remote_engine, objects)

splitgraph.hooks.s3_server module

S3 registry-side routines called from the Python stored procedure that are aware of the actual S3 access creds and generate pre-signed URLs to upload/download objects.

splitgraph.hooks.s3_server.delete_objects(client: minio.api.Minio, object_ids: List[str])None

Delete objects stored in Minio

Parameters
  • client – Minio client

  • object_ids – List of Splitgraph object IDs to delete

splitgraph.hooks.s3_server.get_object_download_urls(s3_host: str, object_ids: List[str])List[List[str]]

Return a list of pre-signed URLs that each part of an object can be downloaded from.

Parameters
  • s3_host – S3 host that the objects are stored on

  • object_ids – List of object IDs

Returns

A list of lists [(object URL, object footer URL, object schema URL)]

splitgraph.hooks.s3_server.get_object_upload_urls(s3_host: str, object_ids: List[str])List[List[str]]

Return a list of pre-signed URLs that each part of an object can be downloaded from.

Parameters
  • s3_host – S3 host that the objects are stored on

  • object_ids – List of object IDs

Returns

A list of lists [(object URL, object footer URL, object schema URL)]

splitgraph.hooks.s3_server.list_objects(client: minio.api.Minio)List[str]

List objects stored in Minio

Parameters

client – Minio client

Returns

List of Splitgraph object IDs

splitgraph.hooks.splitfile_commands module

A framework for custom Splitfile commands. The execution flow is as follows:

  • When the Splitfile executor finds an unknown command, it looks for an entry in the config file:

    [commands]RUN=splitgraph.plugins.Run
  • The command class must extend this class, initialized at every invocation time.

  • The command’s calc_hash() method is run. The resultant command context hash is combined with the current image hash to produce the new image hash: if it already exists, then the image is simply checked out.

  • Otherwise (or if calc_hash is undefined or returns None), execute(), where the actual command should be implemented, is run. If it returns a hash, this hash is used for the new image. If this hash already exists, the existing image is checked out instead. If the command returns None, a random hash is generated for the new image.

class splitgraph.hooks.splitfile_commands.PluginCommand

Bases: object

Base class for custom Splitfile commands.

calc_hash(repository, args)

Calculates the command context hash for this custom command. If either the command context hash or the previous image hash has changed, then the image hash produced by this command will change. Consequently, two commands with the same command context hashes are assumed to have the same effect on any Splitgraph images.

This is supposed to be a lightweight method intended for pre-flight image hash calculations (without performing the actual transformation). If it returns None, the actual transformation is run anyway.

For example, for a command that imports some data from an external URL, this could be the hash of the last modified timestamp provided by the external data vendor. If the timestamp is unchanged, the data is unchanged and so actual command won’t be re-executed.

Parameters
  • repository – SG Repository object pointed to a schema with the checked out image the command is being run against.

  • args – Positional arguments to the command

Returns

Command context hash (a string of 64 hexadecimal digits)

execute(repository, args)

Execute the custom command against the target schema, optionally returning the new image hash. The contract for the command is as follows (though it is not currently enforced by the runtime):

  • Has to use get_engine().run_sql (or run_sql_batch) to interact with the engine.

  • Can only write to the schema with the checked-out repository (run_sql runs non-schema-qualified statements against the correct schema).

  • Can inspect splitgraph_meta (e.g. to find the current HEAD) for the repository.

  • Can’t alter the versioning of the repository.

Parameters
  • repository – SG Repository object pointed to a schema with the checked out image the command is being run against.

  • args – Positional arguments to the command

Returns

Command context hash (a string of 64 hexadecimal digits). If calc_hash() had previously returned a hash, this hash is ignored. If both this command and calc_hash() return None, the hash is randomly generated.