tree: 694c66dccfc1a4377c60396b85b1cdc94baf7687 [path history] [tgz]
  1. desc/
  2. metadata/
  3. src/
  4. LICENSE
  5. pom.xml
  6. README.md
ctakes-smoking-status/README.md

This is version 1.2 of the cTAKES smoking status annotator. The pipeline has been tested on flat text files and CDA documents. The sample provided uses a bar (|) delimited file with multiple records (and patients) per file. The dictionary lookup annotator is limited to smoking status dictionaries provided in the ‘/resources/ss/data/*.dictionary’ files.

The smoking status pipeline processes patient records into five pre-determined categories - past smoker (P), current smoker (C), smoker (S), non-smoker (N), and unknown (U). The definition of smoking status was adapted from I2B2 Natural Language Processing Challenges for Clinical Records.

  • PAST SMOKER (P): A patient whose record asserts either that they are a past smoker or that they were a smoker a year or more ago but who have not smoked for at least one year.
  • CURRENT SMOKER (C): A patient whose record asserts that they are a current smoker (or that they smoked without indicating that they stopped more than a year ago) or that they were a smoker within the past year.
  • SMOKER (S): A patient who is either a CURRENT or a PAST smoker but, whose medical record does not provide enough information to classify the patient as either a CURRENT or a PAST smoker.
  • NON-SMOKER (N): A patient whose record indicates that they have never smoked.
  • UNKNOWN (U): The patient's record does not mention anything about smoking.