splitgraph.core.indexing package

Module contents


splitgraph.core.indexing.bloom module

Bloom filtering on fragments for equality queries.

splitgraph.core.indexing.bloom.describe(index_tuple: Tuple[int, str])str

Returns a pretty-printed summary of the bloom filter


index_tuple – Tuple of (k, base64-encoded fingerprint) returned by generate_bloom_index



splitgraph.core.indexing.bloom.filter_bloom_index(engine: PsycopgEngine, object_ids: List[str], quals: Any)List[str]

Runs a bloom filter on given qualifiers using the given objects’ previously-generated fingerprints.

  • engine – Object engine

  • object_ids – Object IDs

  • quals – List of qualifiers


List of object IDs that might match the qualifiers in quals (including IDs that don’t have a bloom index).

splitgraph.core.indexing.bloom.generate_bloom_index(engine: PsycopgEngine, object_id: str, changeset: Optional[Dict[Tuple[str, ], Tuple[bool, Dict[str, Any], Dict[str, Any]]]], column: str, probability: Optional[float] = None, size: Optional[int] = None)Tuple[int, str]

Generates a bloom filter signature for a given column and a given fragment. Bloom filters can answer queries asking whether an item is definitely not in a given set or possibly can be.

The tradeoff is between the probability of a false positive (item said to be in the set when it actually isn’t) and the size of the filter.

Bloom filters also have an extra parameter, k, or the number of bits in the signature that a certain item flips. This parameter has an optimal value for a given number of distinct items or a probability and so isn’t explicitly passed by the user.

  • engine – Object engine the fragment is cached in.

  • object_id – Fragment ID

  • changeset – Optional, if specified, the old column values are included in the index.

  • column – Column name to generate the index on.

  • probability – Probability of a false positive. Either this or the size of the filter must be specified, but not both.

  • size – Size of the filter, in bytes.


Dictionary to be inserted into the index.

splitgraph.core.indexing.range module

splitgraph.core.indexing.range.extract_min_max_pks(engine: PsycopgEngine, fragments: List[str], table_pks: List[str], table_pk_types: List[str])Any

Extract minimum/maximum PK values for given fragments.

  • engine – Engine the objects live on

  • fragments – IDs of objects

  • table_pks – List of columns forming the table primary key

  • table_pk_types – List of types for table PK columns


List of min/max primary key for every object.

splitgraph.core.indexing.range.filter_range_index(metadata_engine: PsycopgEngine, object_ids: List[str], quals: Any, column_types: Dict[str, str])List[str]
splitgraph.core.indexing.range.generate_range_index(object_engine: PsycopgEngine, object_id: str, table_schema: TableSchema, changeset: Optional[Dict[Tuple[str, ], Tuple[bool, Dict[str, Any], Dict[str, Any]]]], columns: Optional[List[str]] = None)Dict[str, Tuple[T, T]]

Calculate the minimum/maximum values of every column in the object (including deleted values).

  • object_engine – Engine the object is located on

  • object_id – ID of the object.

  • table_schema – Schema of the table

  • changeset – Changeset (old values will be included in the index)

  • columns – Columns to run the index on (default all)


Dictionary of {column: [min, max]}

splitgraph.core.indexing.range.quals_to_sql(quals: Optional[Sequence[Sequence[Tuple[str, str, Any]]]], column_types: Dict[str, str])Tuple[psycopg2.sql.Composable, Tuple]

Convert a list of qualifiers in CNF to a fragment of a Postgres query :param quals: Qualifiers in CNF :param column_types: Dictionary of column names and their types :return: SQL Composable object and a tuple of arguments to be mogrified into it.