Open repository in Console
Readme
Updated over 3 years ago
‧ Indexed over 3 years ago
Medicare Provider Utilization and Payment Data: Physician and Other Supplier PUF CY2017
This is a snapshot of the https://www.splitgraph.com/cms-gov/medicare-provider-utilization-and-payment-data-fs4p-t5eq live dataset.
Reproducing
To build this snapshot yourself:
# Initialize the target repository sgr init medicare-provider-utilization-and-payment-data-snapshot # Mount the remote dataset on the local engine sgr mount socrata \ "cms-gov/medicare-provider-utilization-and-payment-data-fs4p-t5eq" \ --handler-options '{ "domain": "data.cms.gov", "batch_size": 50000, "tables": { "medicare_provider_utilization_and_payment_data": "fs4p-t5eq" } }' # Snapshot it (this will actually get all the data from the upstream, will take ~30 minutes or so sgr sql -s medicare-provider-utilization-and-payment-data-snapshot \ 'CREATE UNLOGGED TABLE data AS SELECT * FROM "cms-gov/medicare-provider-utilization-and-payment-data-fs4p-t5eq".medicare_provider_utilization_and_payment_data' # Add a PK on the :id column sgr sql -s medicare-provider-utilization-and-payment-data-snapshot 'ALTER TABLE data ADD PRIMARY KEY (":id")' # Commit as a Splitgraph image, add bloom filters on a couple of columns for faster scans sgr commit medicare-provider-utilization-and-payment-data-snapshot \ --chunk-size 100000 \ --index-options '{"data": {"bloom": { "nppes_provider_first_name": {"probability": 0.01}, "nppes_provider_last_org_name": {"probability": 0.01}, "nppes_provider_first_name": {"probability": 0.01}, "nppes_provider_city": {"probability": 0.01}, "nppes_provider_state": {"probability": 0.01}, "nppes_provider_zip": {"probability": 0.01}}}}' # Inspect the image sgr show medicare-provider-utilization-and-payment-data-snapshot:latest # Run a query against the image sgr sql -i medicare-provider-utilization-and-payment-data-snapshot:latest \ 'EXPLAIN ANALYZE SELECT nppes_provider_state, COUNT(1) FROM data GROUP BY nppes_provider_state'
Preview
- data99 Objects| 1191 MB |9.85M Rows