splitgraph/medicare-provider-utilization-and-payment-data-snapshot
Open repository in Console
 
Readme
Updated over 3 years ago
Indexed over 3 years ago

Medicare Provider Utilization and Payment Data: Physician and Other Supplier PUF CY2017

This is a snapshot of the https://www.splitgraph.com/cms-gov/medicare-provider-utilization-and-payment-data-fs4p-t5eq live dataset.

Reproducing

To build this snapshot yourself:

# Initialize the target repository
sgr init medicare-provider-utilization-and-payment-data-snapshot

# Mount the remote dataset on the local engine
sgr mount socrata \
  "cms-gov/medicare-provider-utilization-and-payment-data-fs4p-t5eq" \
  --handler-options '{
     "domain": "data.cms.gov",
     "batch_size": 50000,
     "tables": {
          "medicare_provider_utilization_and_payment_data": "fs4p-t5eq"
     }
  }'

# Snapshot it (this will actually get all the data from the upstream, will take ~30 minutes or so
sgr sql -s medicare-provider-utilization-and-payment-data-snapshot \
    'CREATE UNLOGGED TABLE data AS 
        SELECT * FROM "cms-gov/medicare-provider-utilization-and-payment-data-fs4p-t5eq".medicare_provider_utilization_and_payment_data'

# Add a PK on the :id column
sgr sql -s medicare-provider-utilization-and-payment-data-snapshot 'ALTER TABLE data ADD PRIMARY KEY (":id")'

# Commit as a Splitgraph image, add bloom filters on a couple of columns for faster scans
sgr commit medicare-provider-utilization-and-payment-data-snapshot \
    --chunk-size 100000 \
    --index-options '{"data": 
    {"bloom": {
        "nppes_provider_first_name": {"probability": 0.01},
        "nppes_provider_last_org_name": {"probability": 0.01},
        "nppes_provider_first_name": {"probability": 0.01},
        "nppes_provider_city": {"probability": 0.01},
        "nppes_provider_state": {"probability": 0.01},
        "nppes_provider_zip": {"probability": 0.01}}}}'

# Inspect the image
sgr show medicare-provider-utilization-and-payment-data-snapshot:latest

# Run a query against the image
sgr sql -i medicare-provider-utilization-and-payment-data-snapshot:latest \
    'EXPLAIN ANALYZE
    SELECT nppes_provider_state, COUNT(1) FROM data
    GROUP BY nppes_provider_state'
 
Preview
  • data
    99 Objects
     | 1191 MB | 
    9.85M Rows