splitgraph.hooks package

Submodules

splitgraph.hooks.external_objects module

Hooks for registering handlers to upload/download objects from external locations into Splitgraph’s cache.

class splitgraph.hooks.external_objects.ExternalObjectHandler(params)

Bases: object

Framework for allowing to dump objects from the Splitgraph cache to an external location. This allows the objects to be stored somewhere other than the actual remote engine.

External object handlers must extend this class and be registered in the Splitgraph config.

For an example of how this can be used, see splitgraph.hooks.s3: it’s a handler allowing objects to be uploaded to S3/S3-compatible host using the Minio API. It’s registered in the config as follows:

[external_handlers]
S3=splitgraph.hooks.s3.S3ExternalObjectHandler

The protocol and the URLs returned by this handler are stored in splitgraph_meta.external_objects and used to download the objects back into the Splitgraph cache when they are needed.

download_objects(objects)

Download objects from the external location into the Splitgraph cache.

Parameters

objects – List of tuples (object_id, object_url) that this handler had previosly uploaded the objects to.

upload_objects(objects)

Upload objects from the Splitgraph cache to an external location

Parameters

objects – List of object IDs to upload

Returns

A list of URLs (same length as objects) that the objects can be found at.

splitgraph.hooks.external_objects.get_external_object_handler(name, handler_params)

Load an external protocol handler by its name, initializing it with optional parameters.

splitgraph.hooks.external_objects.register_upload_download_handler(name, handler_class)

Register an external protocol handler. See the docstring for get_upload_download_handler for the required signatures of the handler functions.

splitgraph.hooks.mount_handlers module

Hooks for additional handlers used to mount other databases via FDW. These handlers become available in the command line tool (via sgr mount) and in the Splitfile interpreter (via FROM MOUNT).

splitgraph.hooks.mount_handlers.get_mount_handler(mount_handler)

Returns a mount function for a given handler. The mount function must have a signature (mountpoint, server, port, username, password, handler_kwargs).

splitgraph.hooks.mount_handlers.get_mount_handlers()

Returns the names of all registered mount handlers.

splitgraph.hooks.mount_handlers.init_fdw(engine, server_id, wrapper, server_options=None, user_options=None, overwrite=True)

Sets up a foreign data server on the engine.

Parameters
  • engine – PostgresEngine

  • server_id – Name to call the foreign server, must be unique. Will be deleted if exists.

  • wrapper – Name of the foreign data wrapper (must be installed as an extension on the engine)

  • server_options – Dictionary of FDW options

  • user_options – Dictionary of user options

  • overwrite – If the server already exists, delete and recreate it.

splitgraph.hooks.mount_handlers.mount(mountpoint, mount_handler, handler_kwargs)

Mounts a foreign database via Postgres FDW (without creating new Splitgraph objects)

Parameters
  • mountpoint – Mountpoint to import the new tables into.

  • mount_handler – The type of the mounted database. Must be one of postgres_fdw or mongo_fdw.

  • handler_kwargs – Dictionary of options to pass to the mount handler.

splitgraph.hooks.mount_handlers.mount_mongo(mountpoint, server, port, username, password, **table_spec)

Mount a Mongo database.

Mounts one or more collections on a remote Mongo database as a set of foreign tables locally. 

Parameters
  • mountpoint – Schema to mount the remote into.

  • server – Database hostname.

  • port – Port the Mongo server is running on.

  • username – A read-only user that the database will be accessed as.

  • password – Password for the read-only user.

  • table_spec – A dictionary of form {“table_name”: {“db”: <dbname>, “coll”: <collection>, “schema”: {“col1”: “type1”…}}}.

splitgraph.hooks.mount_handlers.mount_mysql(mountpoint, server, port, username, password, remote_schema, tables=None)

Mount a MySQL database.

Mounts a schema on a remote MySQL database as a set of foreign tables locally. 

Parameters
  • mountpoint – Schema to mount the remote into.

  • server – Database hostname.

  • port – Database port

  • username – A read-only user that the database will be accessed as.

  • password – Password for the read-only user.

  • remote_schema – Remote schema name.

  • tables – Tables to mount (default all).

splitgraph.hooks.mount_handlers.mount_postgres(mountpoint, server, port, username, password, dbname, remote_schema, tables=None)

Mount a Postgres database.

Mounts a schema on a remote Postgres database as a set of foreign tables locally. 

Parameters
  • mountpoint – Schema to mount the remote into.

  • server – Database hostname.

  • port – Port the Postgres server is running on.

  • username – A read-only user that the database will be accessed as.

  • password – Password for the read-only user.

  • dbname – Remote database name.

  • remote_schema – Remote schema name.

  • tables – Tables to mount (default all).

splitgraph.hooks.mount_handlers.register_mount_handler(name, mount_function)

Returns a mount function under a given name. See get_mount_handler for the mount handler spec.

splitgraph.hooks.s3 module

Plugin for uploading Splitgraph objects from the cache to an external S3-like object store

class splitgraph.hooks.s3.S3ExternalObjectHandler(params)

Bases: splitgraph.hooks.external_objects.ExternalObjectHandler

Uploads/downloads the objects to/from S3/S3-compatible host using the Minio client. The parameters for this handler (overriding the .sgconfig) are:

  • host: default SG_S3_HOST

  • port: default SG_S3_PORT

  • access_key: default SG_S3_KEY

  • bucket: default same as access_key

  • secret_key: default SG_S3_PWD

You can also specify the number of worker threads (threads) used to upload the objects.

download_objects(objects)

Download objects from Minio.

Parameters

objects – List of (object ID, object URL of form <endpoint>/<bucket>/<key>)

upload_objects(objects)

Upload objects to Minio

Parameters

objects – List of object IDs to upload

Returns

List of URLs the objects were stored at.

splitgraph.hooks.splitfile_commands module

A framework for custom Splitfile commands. The execution flow is as follows:

  • When the Splitfile executor finds an unknown command, it looks for an entry in the config file:

    [commands]
    RUN=splitgraph.plugins.Run
    
  • The command class must extend this class, initialized at every invocation time.

  • The command’s calc_hash() method is run. The resultant command context hash is combined with the current image hash to produce the new image hash: if it already exists, then the image is simply checked out.

  • Otherwise (or if calc_hash is undefined or returns None), execute(), where the actual command should be implemented, is run. If it returns a hash, this hash is used for the new image. If this hash already exists, the existing image is checked out instead. If the command returns None, a random hash is generated for the new image.

class splitgraph.hooks.splitfile_commands.PluginCommand

Bases: object

Base class for custom Splitfile commands.

calc_hash(repository, args)

Calculates the command context hash for this custom command. If either the command context hash or the previous image hash has changed, then the image hash produced by this command will change. Consequently, two commands with the same command context hashes are assumed to have the same effect on any Splitgraph images.

This is supposed to be a lightweight method intended for pre-flight image hash calculations (without performing the actual transformation). If it returns None, the actual transformation is run anyway.

For example, for a command that imports some data from an external URL, this could be the hash of the last modified timestamp provided by the external data vendor. If the timestamp is unchanged, the data is unchanged and so actual command won’t be re-executed.

Parameters
  • repository – SG Repository object pointed to a schema with the checked out image the command is being run against.

  • args – Positional arguments to the command

Returns

Command context hash (a string of 64 hexadecimal digits)

execute(repository, args)

Execute the custom command against the target schema, optionally returning the new image hash. The contract for the command is as follows (though it is not currently enforced by the runtime):

  • Has to use get_engine().run_sql (or run_sql_batch) to interact with the engine.

  • Can only write to the schema with the checked-out repository (run_sql runs non-schema-qualified statements against the correct schema).

  • Can inspect splitgraph_meta (e.g. to find the current HEAD) for the repository.

  • Can’t alter the versioning of the repository.

Parameters
  • repository – SG Repository object pointed to a schema with the checked out image the command is being run against.

  • args – Positional arguments to the command

Returns

Command context hash (a string of 64 hexadecimal digits). If calc_hash() had previously returned a hash, this hash is ignored. If both this command and calc_hash() return None, the hash is randomly generated.

Module contents

Various hooks for extending Splitgraph, including: