Every two weeks or so, I wanted to highlight a medical dataset that is semi-publicly available. I say semi because some medical datasets (especially the most useful ones) require some sort of data use agreement. These agreements typically require a project statement, a signed data use agreement that you won't be anything nefarious or try to de-identify people in the dataset, and optionally but recommended human subjects training through freely available resources like CITI.

These datasets are part of a lecture I give to my students about data sources. Search the tag "medical datasets" to get a list of all blog posts.

For each dataset, we will highlight basic information.

  • name of the dataset.
  • author.
  • short description of purpose.
  • number of rows.
  • number of features.
  • general description of features.
  • data format (csv, sas, etc).
  • url link to data.
  • url to data dictionary.
  • one or two links to papers that use the dataset.