Splitgraph is a data API to power your analytics, data visualizations and other read-intensive applications.
Here are all the tables you will be able to access when you use Splitgraph to query CSV files in S3/HTTP data. We have also listed some useful queries that you can run.
repositories:
- namespace: CHANGEME
repository: csv
# Catalog-specific metadata for the repository. Optional.
metadata:
readme:
text: Readme
description: Description of the repository
topics:
- sample_topic
# Data source settings for the repository. Optional.
external:
# Name of the credential that the plugin uses. This can also be a credential_id if the
# credential is already registered on Splitgraph.
credential: csv
plugin: csv
# Plugin-specific parameters matching the plugin's parameters schema
params:
connection: # Connection. Choose one of:
- connection_type: http # REQUIRED. Connection type. Constant
url: '' # REQUIRED. URL. HTTP URL to the CSV file
- connection_type: s3 # REQUIRED. Connection type. Constant
s3_endpoint: '' # REQUIRED. S3 endpoint. S3 endpoint (including port if required)
s3_bucket: '' # REQUIRED. Bucket name. Bucket the object is in
s3_region: '' # S3 region. Region of the S3 bucket
s3_secure: false # Secure. Whether to use HTTPS for S3 access
s3_object: '' # S3 Object name. Limit the import to a single object
s3_object_prefix: '' # S3 Object prefix. Prefix for object in S3 bucket
autodetect_header: true # Autodetect header. Detect whether the CSV file has a header automatically
autodetect_dialect: true # Autodetect dialect. Detect the CSV file's dialect (separator, quoting characters etc) automatically
autodetect_encoding: true # Autodetect encoding. Detect the CSV file's encoding automatically
autodetect_sample_size: 65536 # Sample size. Sample size, in bytes, for encoding/dialect/header detection
schema_inference_rows: 100000 # Schema inference rows. Number of rows to use for schema inference
encoding: utf-8 # Encoding. Encoding of the CSV file
ignore_decode_errors: false # Ignore decoding errors. Ignore errors when decoding the file
header: true # First line of the CSV file is its header
delimiter: ',' # Delimiter. Character used to separate fields in the file
quotechar: '"' # Quote character. Character used to quote fields
tables:
sample_table:
# Plugin-specific table parameters matching the plugin's schema
options:
cursor_fields: [] # Replication cursor. Column(s) to use as a replication cursor. This must be always increasing in the source table and is used to track which rows should be replicated.
url: '' # URL. HTTP URL to the CSV file
s3_object: '' # S3 object. S3 object of the CSV file
autodetect_header: true # Autodetect header. Detect whether the CSV file has a header automatically
autodetect_dialect: true # Autodetect dialect. Detect the CSV file's dialect (separator, quoting characters etc) automatically
autodetect_encoding: true # Autodetect encoding. Detect the CSV file's encoding automatically
autodetect_sample_size: 65536 # Sample size. Sample size, in bytes, for encoding/dialect/header detection
schema_inference_rows: 100000 # Schema inference rows. Number of rows to use for schema inference
encoding: utf-8 # Encoding. Encoding of the CSV file
ignore_decode_errors: false # Ignore decoding errors. Ignore errors when decoding the file
header: true # First line of the CSV file is its header
delimiter: ',' # Delimiter. Character used to separate fields in the file
quotechar: '"' # Quote character. Character used to quote fields
# Schema of the table, a list of objects with `name` and `type`. If set to `[]`, will infer.
schema: []
# Whether live querying is enabled for the plugin (creates a "live" tag in the
# repository proxying to the data source). The plugin must support live querying.
is_live: true
# Ingestion schedule settings. Disable this if you're using GitHub Actions or other methods
# to trigger ingestion.
schedule:
credentials:
csv: # This is the name of this credential that "external" sections can reference.
plugin: csv
# Credential-specific data matching the plugin's credential schema
data:
s3_access_key: '' # AWS Access Key
s3_secret_key: '' # AWS Secret Access Key
Use our splitgraph.yml format to check your Splitgraph configuration into version control, trigger ingestion jobs and manage your data stack like your code.
Splitgraph connects your vast, unrelated data sources and puts them in a single, accessible place.
Splitgraph handles data integration, storage, transformation and discoverability for you. All that remains is adding a BI client.
Focus on building data-driven applications without worrying about where the data will come from.
Splitgraph supports data ingestion from over 100 SaaS services, as well as data federation to over a dozen databases. These are all made queryable over a PostgreSQL-compatible interface.
Splitgraph stores data in a columnar format. This accelerates analytical queries and makes it perfect for dashboards, blogs and other read-intensive use cases.
Read more about Splitgraph’s support for CSV files in S3/HTTP, including its documentation and sample queries you can run on CSV files in S3/HTTP data with Splitgraph.
Splitgraph has a PostgreSQL-compatible endpoint that most BI clients can connect to.