Splitgraph has been acquired by EDB! Read the blog post.

splitgraph.hooks.data_source package

Submodules

splitgraph.hooks.data_source.base module

class splitgraph.hooks.data_source.base.DataSource(engine: PostgresEngine, credentials: Credentials, params: Params, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None)

Bases: abc.ABC

credentials_schema: Dict[str, Any] = {'type': 'object'}
abstract classmethod get_description() str
classmethod get_icon() Optional[bytes]
abstract classmethod get_name() str
get_raw_url(tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None, expiry: int = 3600) Dict[str, List[Tuple[str, str]]]

Get a list of public URLs for each table in this data source, e.g. to export the data as CSV. These may be temporary (e.g. pre-signed S3 URLs) but should be accessible without authentication. :param tables: A TableInfo object overriding the table params of the source :param expiry: The URL should be valid for at least this many seconds :return: Dict of table_name -> list of (mimetype, raw URL)

abstract introspect() IntrospectionResult
params_schema: Dict[str, Any] = {'type': 'object'}
supports_load = False
supports_mount = False
supports_sync = False
table_params_schema: Dict[str, Any] = {'type': 'object'}
class splitgraph.hooks.data_source.base.LoadableDataSource(engine: PostgresEngine, credentials: Credentials, params: Params, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None)

Bases: splitgraph.hooks.data_source.base.DataSource, abc.ABC

load(repository: splitgraph.core.repository.Repository, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None) str
supports_load = True
class splitgraph.hooks.data_source.base.MountableDataSource(engine: PostgresEngine, credentials: Credentials, params: Params, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None)

Bases: splitgraph.hooks.data_source.base.DataSource, abc.ABC

abstract mount(schema: str, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None, overwrite: bool = True) Optional[List[splitgraph.core.types.MountError]]

Instantiate the data source as foreign tables in a schema

supports_mount = True
class splitgraph.hooks.data_source.base.SyncableDataSource(engine: PostgresEngine, credentials: Credentials, params: Params, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None)

Bases: splitgraph.hooks.data_source.base.LoadableDataSource, splitgraph.hooks.data_source.base.DataSource, abc.ABC

supports_load = True
supports_sync = True
sync(repository: splitgraph.core.repository.Repository, image_hash: Optional[str], tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None) str
class splitgraph.hooks.data_source.base.TransformingDataSource(engine: PostgresEngine, credentials: Credentials, params: Params, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None, image_mounter: Optional[splitgraph.core.image_mounting.ImageMounter] = None)

Bases: splitgraph.hooks.data_source.base.DataSource, abc.ABC

Data source that runs transformations between Splitgraph images. Takes in an extra parameter, an ImageMounter instance to manage temporary image checkouts.

abstract get_required_images() List[Tuple[str, str, str]]

Get images required by this data source. :returns List of tuples (namespace, repository, hash_or_tag)

mount_required_images() Generator[Dict[Tuple[str, str, str], str], None, None]

Mount all images required by this data source into temporary schemas. On exit from this context manager, unmounts them. :return: Map of (namespace, repository, hash_or_tag) -> schema where the image is mounted.

splitgraph.hooks.data_source.base.get_ingestion_state(repository: splitgraph.core.repository.Repository, image_hash: Optional[str]) Optional[SyncState]
splitgraph.hooks.data_source.base.getrandbits(k) x.  Generates an int with k random bits.
splitgraph.hooks.data_source.base.prepare_new_image(repository: splitgraph.core.repository.Repository, hash_or_tag: Optional[str], comment: str = 'Singer tap ingestion') Tuple[Optional[splitgraph.core.image.Image], str]

splitgraph.hooks.data_source.fdw module

class splitgraph.hooks.data_source.fdw.ElasticSearchDataSource(engine: PostgresEngine, credentials: Credentials, params: Params, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None)

Bases: splitgraph.hooks.data_source.fdw.ForeignDataWrapperDataSource

commandline_help: str = 'Mount an ElasticSearch instance.\n\nMount a set of tables proxying to a remote ElasticSearch index.\n\nThis uses a fork of postgres-elasticsearch-fdw behind the scenes. You can add a column\n`query` to your table and set it as `query_column` to pass advanced ES queries and aggregations.\nFor example:\n\n\x08\n```\n$ sgr mount elasticsearch target_schema -c elasticsearch:9200 -o@- <<EOF\n    &lbrace;\n      "tables": &lbrace;\n        "table_1": &lbrace;\n          "schema": &lbrace;\n            "id": "text",\n            "@timestamp": "timestamp",\n            "query": "text",\n            "col_1": "text",\n            "col_2": "boolean"\n          &rbrace;,\n          "options": &lbrace;\n              "index": "index-pattern*",\n              "rowid_column": "id",\n              "query_column": "query"\n          &rbrace;\n        &rbrace;\n      &rbrace;\n    &rbrace;\nEOF\n\x08\n```\n'
credentials_schema: Dict[str, Any] = &lbrace;'properties': &lbrace;'password': &lbrace;'type': 'string'&rbrace;, 'username': &lbrace;'type': 'string'&rbrace;&rbrace;, 'type': 'object'&rbrace;
classmethod get_description() str
get_fdw_name()
classmethod get_name() str
get_server_options()
params_schema: Dict[str, Any] = &lbrace;'properties': &lbrace;'host': &lbrace;'type': 'string'&rbrace;, 'port': &lbrace;'type': 'integer'&rbrace;&rbrace;, 'required': ['host', 'port'], 'type': 'object'&rbrace;
table_params_schema: Dict[str, Any] = &lbrace;'properties': &lbrace;'index': &lbrace;'description': 'ES index name or pattern to use, for example, "events-*"', 'type': 'string'&rbrace;, 'query_column': &lbrace;'description': 'Name of the column to use to pass queries in', 'type': 'string'&rbrace;, 'score_column': &lbrace;'description': 'Name of the column with the document score', 'type': 'string'&rbrace;, 'scroll_duration': &lbrace;'description': 'How long to hold the scroll context open for, default 10m', 'type': 'string'&rbrace;, 'scroll_size': &lbrace;'description': 'Fetch size, default 1000', 'type': 'integer'&rbrace;, 'type': &lbrace;'description': 'Pre-ES7 doc_type, not required in ES7 or later', 'type': 'string'&rbrace;&rbrace;, 'required': ['index'], 'type': 'object'&rbrace;
class splitgraph.hooks.data_source.fdw.ForeignDataWrapperDataSource(engine: PostgresEngine, credentials: Credentials, params: Params, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None)

Bases: splitgraph.hooks.data_source.base.MountableDataSource, splitgraph.hooks.data_source.base.LoadableDataSource, abc.ABC

commandline_help: str = ''
commandline_kwargs_help: str = ''
credentials_schema: Dict[str, Any] = &lbrace;'type': 'object'&rbrace;
classmethod from_commandline(engine, commandline_kwargs) splitgraph.hooks.data_source.fdw.ForeignDataWrapperDataSource

Instantiate an FDW data source from commandline arguments.

abstract get_fdw_name()
get_remote_schema_name() str

Override this if the FDW supports IMPORT FOREIGN SCHEMA

get_server_options() Mapping[str, str]
get_table_options(table_name: str, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None) Dict[str, str]
get_table_schema(table_name: str, table_schema: List[splitgraph.core.types.TableColumn]) List[splitgraph.core.types.TableColumn]
get_user_options() Mapping[str, str]
introspect() IntrospectionResult
mount(schema: str, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None, overwrite: bool = True) Optional[List[splitgraph.core.types.MountError]]

Instantiate the data source as foreign tables in a schema

params_schema: Dict[str, Any] = &lbrace;'type': 'object'&rbrace;
preview(tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]]) PreviewResult
supports_load = True
supports_mount = True
table_params_schema: Dict[str, Any] = &lbrace;'type': 'object'&rbrace;
class splitgraph.hooks.data_source.fdw.MongoDataSource(engine: PostgresEngine, credentials: Credentials, params: Params, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None)

Bases: splitgraph.hooks.data_source.fdw.ForeignDataWrapperDataSource

commandline_help: str = 'Mount a Mongo database.\n\nMounts one or more collections on a remote Mongo database as a set of foreign tables locally.'
commandline_kwargs_help: str = 'tables: A dictionary of form\n```\n&lbrace;\n    "table_name": &lbrace;\n        "schema": &lbrace;"col1": "type1"...&rbrace;,\n        "options": &lbrace;"database": <dbname>, "collection": <collection>&rbrace;\n    &rbrace;\n&rbrace;\n```\n'
credentials_schema: Dict[str, Any] = &lbrace;'properties': &lbrace;'password': &lbrace;'type': 'string'&rbrace;, 'username': &lbrace;'type': 'string'&rbrace;&rbrace;, 'required': ['username', 'password'], 'type': 'object'&rbrace;
classmethod get_description() str
get_fdw_name()
classmethod get_name() str
get_server_options()
get_table_schema(table_name, table_schema)
get_user_options()
params_schema: Dict[str, Any] = &lbrace;'properties': &lbrace;'host': &lbrace;'type': 'string'&rbrace;, 'port': &lbrace;'type': 'integer'&rbrace;&rbrace;, 'required': ['host', 'port'], 'type': 'object'&rbrace;
table_params_schema: Dict[str, Any] = &lbrace;'properties': &lbrace;'collection': &lbrace;'type': 'string'&rbrace;, 'database': &lbrace;'type': 'string'&rbrace;&rbrace;, 'required': ['database', 'collection'], 'type': 'object'&rbrace;
class splitgraph.hooks.data_source.fdw.MySQLDataSource(engine: PostgresEngine, credentials: Credentials, params: Params, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None)

Bases: splitgraph.hooks.data_source.fdw.ForeignDataWrapperDataSource

commandline_help: str = 'Mount a MySQL database.\n\nMounts a schema on a remote MySQL database as a set of foreign tables locally.'
commandline_kwargs_help: str = 'dbname: Remote MySQL database name (required)\ntables: Tables to mount (default all). If a list, then will use IMPORT FOREIGN SCHEMA.\nIf a dictionary, must have the format\n    &lbrace;"table_name": &lbrace;"schema": &lbrace;"col_1": "type_1", ...&rbrace;,\n                    "options": &lbrace;[get passed to CREATE FOREIGN TABLE]&rbrace;&rbrace;&rbrace;.\n        '
credentials_schema: Dict[str, Any] = &lbrace;'properties': &lbrace;'password': &lbrace;'type': 'string'&rbrace;, 'username': &lbrace;'type': 'string'&rbrace;&rbrace;, 'required': ['username', 'password'], 'type': 'object'&rbrace;
classmethod get_description() str
get_fdw_name()
classmethod get_name() str
get_remote_schema_name() str

Override this if the FDW supports IMPORT FOREIGN SCHEMA

get_server_options()
get_table_options(table_name: str, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None)
get_user_options()
params_schema: Dict[str, Any] = &lbrace;'properties': &lbrace;'dbname': &lbrace;'type': 'string'&rbrace;, 'host': &lbrace;'type': 'string'&rbrace;, 'port': &lbrace;'type': 'integer'&rbrace;&rbrace;, 'required': ['host', 'port', 'dbname'], 'type': 'object'&rbrace;
class splitgraph.hooks.data_source.fdw.PostgreSQLDataSource(engine: PostgresEngine, credentials: Credentials, params: Params, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None)

Bases: splitgraph.hooks.data_source.fdw.ForeignDataWrapperDataSource

commandline_help: str = 'Mount a Postgres database.\n\nMounts a schema on a remote Postgres database as a set of foreign tables locally.'
commandline_kwargs_help: str = 'dbname: Database name (required)\nremote_schema: Remote schema name (required)\nextra_server_args: Dictionary of extra arguments to pass to the foreign server\ntables: Tables to mount (default all). If a list, then will use IMPORT FOREIGN SCHEMA.\nIf a dictionary, must have the format\n    &lbrace;"table_name": &lbrace;"schema": &lbrace;"col_1": "type_1", ...&rbrace;,\n                    "options": &lbrace;[get passed to CREATE FOREIGN TABLE]&rbrace;&rbrace;&rbrace;.\n    '
credentials_schema: Dict[str, Any] = &lbrace;'properties': &lbrace;'password': &lbrace;'type': 'string'&rbrace;, 'username': &lbrace;'type': 'string'&rbrace;&rbrace;, 'required': ['username', 'password'], 'type': 'object'&rbrace;
classmethod get_description() str
get_fdw_name()
classmethod get_name() str
get_remote_schema_name() str

Override this if the FDW supports IMPORT FOREIGN SCHEMA

get_server_options()
get_table_options(table_name: str, tables: Optional[Union[List[str], Dict[str, Tuple[List[splitgraph.core.types.TableColumn], TableParams]]]] = None)
get_user_options()
params_schema: Dict[str, Any] = &lbrace;'properties': &lbrace;'dbname': &lbrace;'description': 'Database name', 'type': 'string'&rbrace;, 'host': &lbrace;'description': 'Remote hostname', 'type': 'string'&rbrace;, 'port': &lbrace;'description': 'Port', 'type': 'integer'&rbrace;, 'remote_schema': &lbrace;'description': 'Remote schema name', 'type': 'string'&rbrace;&rbrace;, 'required': ['host', 'port', 'dbname', 'remote_schema'], 'type': 'object'&rbrace;
table_params_schema: Dict[str, Any] = &lbrace;'type': 'object'&rbrace;
splitgraph.hooks.data_source.fdw.create_foreign_table(schema: str, server: str, table_name: str, schema_spec: List[splitgraph.core.types.TableColumn], extra_options: Optional[Dict[str, str]] = None)
splitgraph.hooks.data_source.fdw.import_foreign_schema(engine: PsycopgEngine, mountpoint: str, remote_schema: str, server_id: str, tables: List[str], options: Optional[Dict[str, str]] = None) List[splitgraph.core.types.MountError]
splitgraph.hooks.data_source.fdw.init_fdw(engine: PsycopgEngine, server_id: str, wrapper: str, server_options: Optional[Mapping[str, Optional[str]]] = None, user_options: Optional[Mapping[str, str]] = None, role: Optional[str] = None, overwrite: bool = True) None

Sets up a foreign data server on the engine.

Parameters
  • engine – PostgresEngine

  • server_id – Name to call the foreign server, must be unique. Will be deleted if exists.

  • wrapper – Name of the foreign data wrapper (must be installed as an extension on the engine)

  • server_options – Dictionary of FDW options

  • user_options – Dictionary of user options

  • role – The name of the role for which the user mapping is created; defaults to public.

  • overwrite – If the server already exists, delete and recreate it.

Module contents

splitgraph.hooks.data_source.get_data_source(data_source: str) Type[splitgraph.hooks.data_source.base.DataSource]

Returns a class for a given data source

splitgraph.hooks.data_source.get_data_sources() List[str]

Returns the names of all registered data sources.

splitgraph.hooks.data_source.merge_jsonschema(left: Dict[str, Any], right: Dict[str, Any]) Dict[str, Any]
splitgraph.hooks.data_source.register_data_source(name: str, data_source_class: Type[splitgraph.hooks.data_source.base.DataSource]) None

Returns a data source under a given name.