The Apache™ clinical Text Analysis and Knowledge Extraction System (cTAKES™) focuses on extracting knowledge from clinical text through Natural Language Processing (NLP) techniques.
cTAKES is engineered in a modular fashion and employs leading-edge rule-based and machine learning methods.
cTAKES has standard features for biomedical text processing software, including the ability to extract concepts such as symptoms, procedures, diagnoses, medications and anatomy with attributes and standard codes.
More powerful components can perform tasks as complex as identifying temporal events, dates and times – resulting in placement of events in a patient timeline.
Components are trained on gold standards from the biomedical as well as the general domain. This affords usability across different types of clinical narrative (e.g. radiology reports, clinical notes, discharge summaries) in various institution formats as well as other types of health-related narrative (e.g. twitter feeds), using multiple data standards (e.g. Health Level 7 (HL7), Clinical Document Architecture (CDA), Fast Healthcare Interoperability Resources (FHIR), SNOMED-CT, RxNORM).
cTAKES is the NLP platform for many initiatives across the world covering a variety of research purposes and large datasets. Contributors include professionals at medical and commercial institutions, NLP and Machine Learning researchers, Medical Doctors, and students of many disciplines and levels. We encourage people from all backgrounds to get involved! (link)
java -version
python -V
[!NOTE] If you are using an integrated development environment (IDE), please see its documentation on using git, Java, Python, and Apache Maven. You should be able to use features in your IDE instead of running commands in a terminal.
mvn -version
For access to all cTAKES capabilities, download a pre-built copy of a cTAKES installation from the release area.
The names of pre-built installations follow the format apache-ctakes-#.#.#-bin.zip
. After unzipping the release file and obtaining a UMLS license, use the UMLS Package Fetcher GUI to install a copy of the default dictionary for Named Entity Recognition (NER) using cTAKES Fast Dictionary Lookup. You can then use the Piper File Submitter GUI to submit jobs, or run any of the scripts in the bin/
directory.
All source code for cTAKES versions 5+ is available from the cTAKES GitHub repository.
git clone https://github.com/apache/ctakes.git
mvn clean compile
resources/org/apache/ctakes/dictionary/lookup/fast
directory.[!TIP] As an alternative to steps 3 and 4, you can use the UMLS Package Fetcher GUI. Run the class
DictionaryDownloader.java
to launch that tool, or use thegetUmlsDictionary
script if using a full build of cTAKES.
PiperFileRunner.java
. To use the Piper File Submitter GUI, run the PiperRunnerGui.java
class.[!NOTE] To run the cTAKES Java classes, the full Java classpath must be configured. Setting up a classpath is beyond the scope of this document.
An integrated development environment (IDE) should set up the classpath for you, please see its documentation.
[!IMPORTANT] You cannot run scripts in the
bin/
directory within a development environment. Within a cTAKES development environment you can run Java classes and Maven profiles, but no scripts in thebin
directory.
[!TIP] You can build your own cTAKES installation from a development environment using Apache Maven. A cTAKES installation is required to run scripts in the
bin/
directory.
mvn clean compile package
[!NOTE] If you are using an integrated development environment (IDE), please see its documentation on using Apache Maven.
After packaging, there should be tar and zip files for apache-ctakes-#.#.#.-bin
and apache-ctakes-#.#.#.-src
in your ctakes-distribution/target/
directory. 7. Unzip the apache-ctakes-#.#.#.-bin
into a directory outside your cTAKES development area.
You can write to the cTAKES user and developer mailing lists: user at ctakes.apache.org
and dev at apache.ctakes.org
and find answers to previously asked questions by searching the user and developer mail archives.