Series on medical datasets

Every two weeks or so, I wanted to highlight a medical dataset that is semi-publicly available. I say semi because some medical datasets (especially the most useful ones) require some sort of data use agreement. These agreements typically require a project statement, a signed data use agreement that you won't be anything nefarious or try to de-identify people in the dataset, and optionally but recommended human subjects training through freely available resources like CITI.

These datasets are part of a lecture I give to my students about data sources. Search the tag "medical datasets" to get a list of all blog posts.

For each dataset, we will highlight basic information.

  • name of the dataset.
  • author.
  • short description of purpose.
  • number of rows.
  • number of features.
  • general description of features.
  • data format (csv, sas, etc).
  • url link to data.
  • url to data dictionary.
  • one or two links to papers that use the dataset.

NY Tech Meetup Hack of the Month

Yesterday I agreed with about 4 hours of notice to present our work on using text classifiers to identify alcohol use with Twitter as the NY Tech Meetup October Hack of the Month. My goal was to convey in 5 minutes a small glimpse of the science that drives machine learning models. I felt humbled in the presence of startup team's working at the front lines to implement solutions to people's problems. Many thanks to Brandon Diamond for keeping it light and fun. Here's a copy of my slides.

My New Home

After several years at, it's time to give the site a bit of a revamp and upgrade. I am trying the squarespace client from my original wordpress client. I like the changes so far and for the types of updates I'm doing, sticking to something simple to maintenance and upkeep is the way to go.