Query CSV files in S3/HTTP data

With its PostgreSQL-compatible interface, Splitgraph is the easiest way to get data from CSV files in S3/HTTP queryable with any of your BI tools.

use tools

What is Splitgraph?

Splitgraph is a data API to power your analytics, data visualizations and other read-intensive applications.

Get started
 

Connecting CSV files in S3/HTTP to your query tool with Splitgraph

First, connect Splitgraph to CSV files in S3/HTTP.
This will create a Splitgraph repository with data from CSV files in S3/HTTP.
You can now query it with any SQL client or BI tool that supports PostgreSQL.
Now, connect your query tool to Splitgraph.
Data from CSV files in S3/HTTP will now be available to query directly from your SQL client.

Tables available in the CSV files in S3/HTTP data source

Here are all the tables you will be able to access when you use Splitgraph to query CSV files in S3/HTTP data. We have also listed some useful queries that you can run.

repositories:
- namespace: CHANGEME
  repository: csv
  # Catalog-specific metadata for the repository. Optional.
  metadata:
    readme:
      text: Readme
    description: Description of the repository
    topics:
    - sample_topic
  # Data source settings for the repository. Optional.
  external:
    # Name of the credential that the plugin uses. This can also be a credential_id if the
    # credential is already registered on Splitgraph.
    credential: csv
    plugin: csv
    # Plugin-specific parameters matching the plugin's parameters schema
    params:
      connection:  # Connection. Choose one of:
      - connection_type: http  # REQUIRED. Connection type. Constant
        url: '' # REQUIRED. URL. HTTP URL to the CSV file
      - connection_type: s3  # REQUIRED. Connection type. Constant
        s3_endpoint: '' # REQUIRED. S3 endpoint. S3 endpoint (including port if required)
        s3_bucket: '' # REQUIRED. Bucket name. Bucket the object is in
        s3_region: '' # S3 region. Region of the S3 bucket
        s3_secure: false # Secure. Whether to use HTTPS for S3 access
        s3_object: '' # S3 Object name. Limit the import to a single object
        s3_object_prefix: '' # S3 Object prefix. Prefix for object in S3 bucket
      autodetect_header: true # Autodetect header. Detect whether the CSV file has a header automatically
      autodetect_dialect: true # Autodetect dialect. Detect the CSV file's dialect (separator, quoting characters etc) automatically
      autodetect_encoding: true # Autodetect encoding. Detect the CSV file's encoding automatically
      autodetect_sample_size: 65536 # Sample size. Sample size, in bytes, for encoding/dialect/header detection
      schema_inference_rows: 100000 # Schema inference rows. Number of rows to use for schema inference
      encoding: utf-8 # Encoding. Encoding of the CSV file
      ignore_decode_errors: false # Ignore decoding errors. Ignore errors when decoding the file
      header: true # First line of the CSV file is its header
      delimiter: ',' # Delimiter. Character used to separate fields in the file
      quotechar: '"' # Quote character. Character used to quote fields
    tables:
      sample_table:
        # Plugin-specific table parameters matching the plugin's schema
        options:
          cursor_fields: []  # Replication cursor. Column(s) to use as a replication cursor. This must be always increasing in the source table and is used to track which rows should be replicated.
          url: '' # URL. HTTP URL to the CSV file
          s3_object: '' # S3 object. S3 object of the CSV file
          autodetect_header: true # Autodetect header. Detect whether the CSV file has a header automatically
          autodetect_dialect: true # Autodetect dialect. Detect the CSV file's dialect (separator, quoting characters etc) automatically
          autodetect_encoding: true # Autodetect encoding. Detect the CSV file's encoding automatically
          autodetect_sample_size: 65536 # Sample size. Sample size, in bytes, for encoding/dialect/header detection
          schema_inference_rows: 100000 # Schema inference rows. Number of rows to use for schema inference
          encoding: utf-8 # Encoding. Encoding of the CSV file
          ignore_decode_errors: false # Ignore decoding errors. Ignore errors when decoding the file
          header: true # First line of the CSV file is its header
          delimiter: ',' # Delimiter. Character used to separate fields in the file
          quotechar: '"' # Quote character. Character used to quote fields
        # Schema of the table, a list of objects with `name` and `type`. If set to `[]`, will infer. 
        schema: []
    # Whether live querying is enabled for the plugin (creates a "live" tag in the
    # repository proxying to the data source). The plugin must support live querying.
    is_live: true
    # Ingestion schedule settings. Disable this if you're using GitHub Actions or other methods
    # to trigger ingestion.
    schedule:
credentials:
  csv:  # This is the name of this credential that "external" sections can reference.
    plugin: csv
    # Credential-specific data matching the plugin's credential schema
    data:
      s3_access_key: ''  # AWS Access Key
      s3_secret_key: '' # AWS Secret Access Key
Use Data Source in splitgraph.yml
You can copy this into splitgraph.yml, or we'll generate it for you.

Developer-first

Use our splitgraph.yml format to check your Splitgraph configuration into version control, trigger ingestion jobs and manage your data stack like your code.

Get started
 
heart-icon

Why Splitgraph and CSV files in S3/HTTP?

Splitgraph connects your vast, unrelated data sources and puts them in a single, accessible place.

Unify your data stack

Splitgraph handles data integration, storage, transformation and discoverability for you. All that remains is adding a BI client.

Read more
 

Power your applications

Focus on building data-driven applications without worrying about where the data will come from.

heart-icon

Not just Data Source...

Splitgraph supports data ingestion from over 100 SaaS services, as well as data federation to over a dozen databases. These are all made queryable over a PostgreSQL-compatible interface.

heart-icon

Optimized for analytics

Splitgraph stores data in a columnar format. This accelerates analytical queries and makes it perfect for dashboards, blogs and other read-intensive use cases.

use-tools

Do more with CSV files in S3/HTTP?

heart-icon

CSV files in S3/HTTP on Splitgraph

Read more about Splitgraph’s support for CSV files in S3/HTTP, including its documentation and sample queries you can run on CSV files in S3/HTTP data with Splitgraph.

CSV files in S3/HTTP overview
 
heart-icon

Connecting to Splitgraph

Splitgraph has a PostgreSQL-compatible endpoint that most BI clients can connect to.

Try it out