Splitgraph has been acquired by EDB! Read the blog post.

seafowl.toml configuration

Using environment variables

Seafowl supports sourcing configuration values from environment variables.

The environment variable format is: SEAFOWL__[section]__[section]__[key]=value. The key or section names are separated by a double underscore __. Dots in names must also be replaced with a double underscore.

Environment variables take precedence over the config file.

For example: SEAFOWL__FRONTEND__HTTP__WRITE_ACCESS=off is equivalent to setting the configuration parameter frontend.http.write_access=off.

object_store section

This section contains the configuration for the object store used by Seafowl to store data.

Select the object store by setting a type=... parameter and configure it by adding extra fields for the specific flavor.

type = "local"

Default. Store data files on the local filesystem.

data_dir

The directory to store data files in. Default ./seafowl-data.

type = "memory"

Store the data in RAM. This does not support any other parameters.

Note that when using this option, restarting the process will lose all data. In addition, combining an in-memory catalog with an persistent object store (or vice versa) will lead to consistency issues.

type = "s3"

Store data files in S3-compatible object storage such as S3 itself, MinIO, Cloudflare R2 etc.

⚠️ NOTE: If you're using actual AWS S3, do not specify endpoint, please specify only region.

region

AWS S3 region. Optional.

access_key_id

AWS access key ID. Required.

secret_access_key

AWS secret access key. Required.

endpoint

Service endpoint for storage, for Minio or other S3-like APIs. If using S3 itself, use the region parameter instead. Optional.

Example: https://localhost:9000

bucket

Name of the S3 bucket. Required.

type = "gcs"

Store data files in a GCS bucket.

bucket

Name of the GCS bucket. Required.

google_application_credentials

Path to the GCP JSON credentials file. Optional, the credentials can be sourced from the env var GOOGLE_APPLICATION_CREDENTIALS, or the metadata server in case of GCP VMs.

object_store.cache_properties section

This is an optional sub-section for the S3 object store, which enables caching of fetched object byte ranges. In addition, it performs range coalescing, by enforcing a minimum byte range threshold for fetching.

It stores the actual contents of the cached entries in a temporary directory on the local file system.

capacity

Maximum size of all objects in the cache. Defaults to 512 MB.

min_fetch_size

Determines the minimum range size for a byte fetch request. Defaults to 2MB.

ttl_s

Time-to-live for the entries in the cache. Defaults to 3 minutes.

catalog section

This section contains the configuration for the catalog used by Seafowl to store metadata (table names and mappings to partitions, index for partition pruning, UDF definitions etc).

Select the catalog by setting a type=... parameter and configure it by adding extra fields for the specific flavor.

type = "sqlite"

Default. Store the catalog in a local SQLite file.

dsn

Path to the SQLite file or the connection string. Default ./seafowl-data/seafowl.sqlite.

You can use :memory: here to use an in-memory SQLite database. Note that when using this option, restarting the process will lose all data. In addition, combining an in-memory catalog with an persistent object store (or vice versa) will lead to consistency issues.

journal_mode

Journal mode used by SQLite. Default wal. One of delete, truncate, persist, memory, wal, off. See the SQLite documentation for more information.

journal_mode = 'delete' is required to make a Seafowl instance work against LiteFS as a leader (since it doesn't support wal).

read_only

Open the SQLite database in read-only mode. Using journal_mode = 'off' and read_only = true is required to make a Seafowl instance work against a LiteFS replica.

type = "postgres"

Store the catalog in a PostgreSQL database.

dsn

Connection URI to the PostgreSQL database, in the format postgresql://[user[:password]@][[host][:port][,...]][/dbname][name=value[&...]]

Example: postgresql://user:secret@localhost

frontend.http section

This section contains the configuration for the HTTP frontend used to query Seafowl from Web applications. Omit this section to disable the HTTP frontend altogether.

write_access

Settings for write access to Seafowl (execution of any non-SELECT/EXPLAIN queries). This can be either any (anyone can write), off (disabled) or a SHA-256 hash of a password.

By default, Seafowl will generate and write a password hash to this section (as well as the actual password in the logs) once when it starts up without detecting a config file.

If a config file already exists and this is omitted, it defaults to off.

To generate a new password, you can use this Bash snippet:

pw=$(< /dev/urandom LC_ALL=C tr -dc A-Za-z0-9 | head -c${1:-32};echo -n)
pw_hash=$(echo -n $pw | sha256sum - | head -c 64)
echo -e "Password: $pw\nHash: $pw_hash"

read_access

Settings for read access to Seafowl (execution of SELECT/EXPLAIN queries). This can be either any (anyone can read), off (disabled) or a SHA-256 hash of a password. By default, this is set to any.

The read password can be different from the write password.

bind_host

IP address to bind the HTTP frontend to. Default 127.0.0.1. To expose Seafowl to other machines on the network, use 0.0.0.0 here.

bind_port

Port for the HTTP frontend. Default 8080.

upload_data_max_length

Maximum size (in MB) of uploads to Seafowl's /upload endpoint. Default 2MB. Note that Seafowl currently keeps the whole uploaded file in memory, making the upload endpoint unsuitable for memory-constrained environments.

cache_control

The directives set as Cache-Control header value for the cached GET endpoint. Optional, defaults to max-age=43200, public.

frontend.postgres section

This section contains the configuration for the PostgreSQL frontend used to query Seafowl by PostgreSQL clients. This endpoint doesn't support authentication or encryption and should only be used in development.

By default, this section is omitted and disabled.

bind_host

IP address to bind the PostgreSQL frontend to. Default 127.0.0.1. To expose Seafowl to other machines on the network, use 0.0.0.0 here.

bind_port

Port for the PostgreSQL frontend. Default 6432.

misc section

Miscellaneous Seafowl configuration.

max_partition_size

Maximum length (in rows) of a Parquet file (partition) to produce when writing Seafowl tables. Default 1048576 (1024x1024).

For more information on partitioning, see the learning section.

gc_interval

Interval (in hours) at which a cron task will run garbage collection of orphan partitions (effectively invoking VACUUM PARTITIONS).

Default is 0 (i.e. the task is not run at all).

runtime section

Various configuration settings related to executing queries.

max_memory

Guideline for the maximum amount of RAM (in MB) for DataFusion to use when executing queries, spilling data to disk during operations where there isn't enough memory. Note that DataFusion currently doesn't always respect this amount and it's not a guaranteed maximum RAM cap.

Default unlimited.

temp_dir

Override the temporary directory used to spill files during execution when DataFusion reaches the memory limit.