cdc-gov/cdc-text-corpora-for-learners-html-mirrors-of-mmwr-ut5n-bmc3
Icon for Socrata external plugin
Open repository in Console
 
Readme
Updated over 1 year ago
Indexed 11 months ago

CDC Text Corpora for Learners: HTML Mirrors of MMWR, EID, and PCD

The attached ZIP archives are part of the <a href="https://github.com/cmheilig/harvest-cdc-journals">CDC Text Corpora for Learners</a> program. This version, comprised of 33,567 articles, was constructed on 2024-03-01 using source content retrieved on 2024-01-09.

The attached three ZIP archives contain the 33,567 articles in 33,576 compiled HTML mirrors of the MMWR <a href="https://www.cdc.gov/mmwr/">Morbidity and Mortality Weekly Report</a> including its series: <i>Weekly Reports</i>, <i>Recommendations and Reports</i>, <i>Surveillance Summaries</i>, <i>Supplements</i>, and <i>Notifiable Diseases</i>, a subset of <i>Weekly Reports</i>, constructed ad hoc; EID <a href="https://www.cdc.gov/eid/">Emerging Infectious Diseases</a>; and PCD <a href="https://www.cdc.gov/pcd/">Preventing Chronic Disease</a>.There is one archive per series. The archive attachments are located in the <i>About this Dataset</i> section of this landing page. In that section when you click Show More, the attachments are located in the section <i>Attachments</i>.

The retrieval and organization of the files included making as few changes to raw sources as possible, to support as many downstream uses as possible.

Querying over HTTP

Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

curl https://data.splitgraph.com/sql/query/ddn \
    -H "Content-Type: application/json" \
    -d@-<<EOF
{"sql": "
    SELECT *
    FROM \"cdc-gov/cdc-text-corpora-for-learners-html-mirrors-of-mmwr-ut5n-bmc3\".\"cdc_text_corpora_for_learners_html_mirrors_of_mmwr\"
    LIMIT 100 
"}
EOF

See the Splitgraph documentation for more information.

 
Preview
  • cdc_text_corpora_for_learners_html_mirrors_of_mmwr
     
     
     
     
     
Upstream Metadata