splitgraph.ingestion package

Submodules 

splitgraph.ingestion.common module 

class splitgraph.ingestion.common.IngestionAdapter

Bases: object

abstract create_ingestion_table(data, engine, schema: str, table: str, **kwargs)

abstract data_to_new_table(data, engine, schema: str, table: str, no_header: bool = True, **kwargs)

abstract query_to_data(engine, query: str, schema: Optional[str] = None, **kwargs)

to_data(query: str, image: Optional[Union[splitgraph.core.image.Image, str]] = None, repository: Optional[splitgraph.core.repository.Repository] = None, use_lq: bool = False, **kwargs)

to_table(data, repository: splitgraph.core.repository.Repository, table: str, if_exists: str = 'patch', schema_check: bool = True, no_header: bool = False, **kwargs)

splitgraph.ingestion.common.add_timestamp_tags(repository: splitgraph.core.repository.Repository, image_hash: str)

splitgraph.ingestion.common.build_commandline_help(json_schema)

splitgraph.ingestion.common.dedupe_sg_schema(schema_spec: List[splitgraph.core.types.TableColumn], prefix_len: int = 59) → List[splitgraph.core.types.TableColumn]: Some foreign schemas have columns that are longer than 63 characters where the first 63 characters are the same between several columns (e.g. odn.data.socrata.com). This routine renames columns in a schema to make sure this can’t happen (by giving duplicates a number suffix).

splitgraph.ingestion.common.generate_column_names(schema_spec: List[splitgraph.core.types.TableColumn], prefix: str = 'col_') → List[splitgraph.core.types.TableColumn]: Replace empty column names with autogenerated ones

splitgraph.ingestion.common.merge_tables(engine: splitgraph.engine.postgres.engine.PsycopgEngine, source_schema: str, source_table: str, source_schema_spec: List[splitgraph.core.types.TableColumn], target_schema: str, target_table: str, target_schema_spec: List[splitgraph.core.types.TableColumn])

splitgraph.ingestion.common.schema_compatible(source_schema: List[splitgraph.core.types.TableColumn], target_schema: List[splitgraph.core.types.TableColumn]) → bool

Quick check to see if a dataframe with target_schema can be written into source_schema. There are some implicit type conversions that SQLAlchemy/Pandas can do so we don’t want to immediately fail if the column types aren’t exactly the same (eg bigint vs numeric etc). Most errors should be caught by PG itself.

Schema is a list of (ordinal, name, type, is_pk).

splitgraph.ingestion.inference module 

splitgraph.ingestion.inference.infer_sg_schema(sample: Sequence[List[str]], override_types: Optional[Dict[str, str]] = None, primary_keys: Optional[List[str]] = None)

splitgraph.ingestion.inference.parse_bigint(integer: str)

splitgraph.ingestion.inference.parse_boolean(boolean: str)

splitgraph.ingestion.inference.parse_int(integer: str)

splitgraph.ingestion.inference.parse_json(json_s: str)

splitgraph.ingestion.pandas module 

Routines that ingest/export CSV files to/from Splitgraph images using Pandas

class splitgraph.ingestion.pandas.PandasIngestionAdapter

Bases: splitgraph.ingestion.common.IngestionAdapter

static create_ingestion_table(data, engine, schema: str, table: str, **kwargs)

static data_to_new_table(data, engine: PsycopgEngine, schema: str, table: str, no_header: bool = True, **kwargs)

static query_to_data(engine, query: str, schema: Optional[str] = None, **kwargs)

splitgraph.ingestion.pandas.df_to_table(df: Union[pandas.core.series.Series, pandas.core.frame.DataFrame], repository: splitgraph.core.repository.Repository, table: str, if_exists: str = 'patch', schema_check: bool = True) → None

Writes a Pandas DataFrame to a checked-out Splitgraph table. Doesn’t create a new image.

Parameters

df – Pandas DataFrame to insert.
repository – Splitgraph Repository object. Must be checked out.
table – Table name.
if_exists – Behaviour if the table already exists: ‘patch’ means that primary keys that already exist in the

table will be updated and ones that don’t will be inserted. ‘replace’ means that the table will be dropped and recreated. :param schema_check: If False, skips checking that the dataframe is compatible with the target schema.

splitgraph.ingestion.pandas.df_to_table_fast(engine: PsycopgEngine, df: Union[pandas.core.series.Series, pandas.core.frame.DataFrame], target_schema: str, target_table: str)

splitgraph.ingestion.pandas.sql_to_df(sql: str, image: Optional[Union[splitgraph.core.image.Image, str]] = None, repository: Optional[splitgraph.core.repository.Repository] = None, use_lq: bool = False, **kwargs) → pandas.core.frame.DataFrame

Executes an SQL query against a Splitgraph image, returning the result.

Extra **kwargs are passed to Pandas’ read_sql_query.

Parameters

sql – SQL query to execute.
image – Image object, image hash/tag (str) or None (use the currently checked out image).
repository – Repository the image belongs to. Must be set if image is a hash/tag or None.
use_lq – Whether to use layered querying or check out the image if it’s not checked out.

Returns

A Pandas dataframe.

Splitgraph has been acquired by EDB! Read the blog post.

splitgraph.ingestion package

Subpackages 

Submodules 

splitgraph.ingestion.common module 

splitgraph.ingestion.inference module 

splitgraph.ingestion.pandas module 

Module contents 

Product

Support

Company

Splitgraph

Splitgraph has been acquired by EDB! Read the blog post.

splitgraph.ingestion package

Subpackages

Submodules

splitgraph.ingestion.common module

splitgraph.ingestion.inference module

splitgraph.ingestion.pandas module

Module contents

Product

Support

Company

Community

Splitgraph

Subpackages 

Submodules 

splitgraph.ingestion.common module 

splitgraph.ingestion.inference module 

splitgraph.ingestion.pandas module 

Module contents 