| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "https://www.w3.org/TR/html4/loose.dtd"> |
| |
| |
| <!-- ====================================================================== --> |
| <!-- GENERATED FILE, DO NOT EDIT, EDIT THE XML FILE IN xdocs INSTEAD! --> |
| <!-- ====================================================================== --> |
| <html> |
| <head> |
| <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/> |
| <style type="text/css">@import "stylesheets/base.css";</style> |
| <meta name="author" value=" |
| Apache UIMA Documentation Team"> |
| <meta name="email" value="dev@uima.apache.org"> |
| |
| |
| |
| <title>Apache UIMA - Apache UIMA Addons and Sandbox</title> |
| |
| <!-- Begin Cookie Consent plugin by Silktide - https://silktide.com/cookieconsent --> |
| <!-- Commented out because implied consent is not compatible with GDPR --> |
| <!-- |
| <script type="text/javascript"> |
| window.cookieconsent_options = {"message":"This website uses cookies to ensure you get the best experience on our website","dismiss":"Got it!","learnMore":"More info","link":"https://uima.apache.org/privacy-policy.html","theme":"dark-bottom"}; |
| </script> |
| |
| <script type="text/javascript" src="/cookieconsent2/cookieconsent.min.js"></script> |
| --> |
| <!-- End Cookie Consent plugin --> |
| |
| <!-- Begin Google Analytics --> |
| <!-- Commented out because GA requires consent according to GDPR --> |
| <!-- |
| <script> |
| (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ |
| (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), |
| m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) |
| })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); |
| |
| ga('create', 'UA-70846351-1', 'auto'); |
| ga('set', 'anonymizeIp', true); |
| ga('send', 'pageview'); |
| |
| </script> |
| --> |
| <!-- End Google Analytics --> |
| </head> |
| |
| <body> |
| <div class="topLogos"> |
| <table border="0" width="100%" cellspacing="0"> |
| <!-- TOP IMAGE --> |
| <tr> |
| <td align='LEFT'> |
| <a href="index.html"> |
| <img style="border: 1px solid black;" src="./images/UIMA_banner2tlpTm.png" alt="UIMA project logo" border="0"/> |
| </a> |
| </td> |
| <td align='CENTER'> |
| <div class="pageBanner">Apache UIMA Addons and Sandbox</div> |
| </td> |
| <td align='RIGHT'> |
| <a href="https://www.apache.org"> |
| <img src="./images/asf-logo-on-white-smallTm.png" alt="Apache UIMA" border="0"/> |
| </a> |
| </td> |
| </tr> |
| </table> |
| <hr noshade="" size="1"/> |
| </div> |
| <table border="0" width="100%" cellspacing="4"> |
| <tr> |
| <td align='RIGHT' colspan="2"> |
| <form method="get" action="https://www.google.com/search"> |
| Search the site |
| <input type="text" name="q" size="25" maxlength="255" value="" /> |
| <input type="hidden" name="sitesearch" value="https://uima.apache.org/" /> |
| <input name="Search" value="Search Site" type="submit"/> |
| </form> |
| </td> |
| </tr> |
| <tr> <!-- LEFT SIDE NAVIGATION --> |
| <td width="20%" valign="top"> |
| |
| |
| |
| |
| |
| |
| <!-- regular menu --> |
| <div class="navBar"> |
| <br/> |
| <div class="navBarItem"> <div class="navPartHeading">General</div> |
| </div> |
| <div class="navBar"> |
| <div class="navBarItem"> <a href="./index.html">Home</a> |
| </div> |
| <div class="navBarItem"> <a href="./downloads.cgi">Downloads</a> |
| </div> |
| <div class="navBarItem"> <a href="./documentation.html">Documentation</a> |
| </div> |
| <div class="navBarItem"> <a href="./news.html">News</a> |
| </div> |
| <div class="navBarItem"> <a href="./publications.html">Publications</a> |
| </div> |
| <br style="line-height: .5em"/> |
| <div class="navBarItem"> <a href="https://issues.apache.org/jira/browse/uima" target="_blank" rel="noopener">Issue tracker <img src="images/offsitelink.png"/></a> |
| </div> |
| <div class="navBarItem"> <a href="https://cwiki.apache.org/confluence/display/UIMA/" target="_blank" rel="noopener">Wiki <img src="images/offsitelink.png"/></a> |
| </div> |
| <br style="line-height: .5em"/> |
| <div class="navBarItem"> <a href="https://cwiki.apache.org/confluence/display/UIMA/Powered+by+Apache+UIMA" target="_blank" rel="noopener">Powered By UIMA <img src="images/offsitelink.png"/></a> |
| </div> |
| </div> |
| <br/> |
| <div class="navBarItem"> <div class="navPartHeading">Community</div> |
| </div> |
| <div class="navBar"> |
| <div class="navBarItem"> <a href="./get-involved.html">Get Involved</a> |
| </div> |
| <div class="navBarItem"> <a href="./mail-lists.html">Mailing Lists</a> |
| </div> |
| <div class="navBarItem"> <a href="./contribution-policy.html">Contribution Policies</a> |
| </div> |
| <div class="navBarItem"> <a href="./faq.html">FAQ</a> |
| </div> |
| <div class="navBarItem"> <a href="./project-guidelines.html">Project Guidelines</a> |
| </div> |
| </div> |
| <br/> |
| <div class="navBarItem"> <div class="navPartHeading">Scaleout Frameworks</div> |
| </div> |
| <div class="navBar"> |
| <div class="navBarItem"> <a href="./doc-uimaas-what.html">UIMA-AS</a> |
| </div> |
| <div class="navBarItem"> <a href="./doc-uimaducc-whatitam.html">UIMA-DUCC</a> |
| </div> |
| <div class="navBarItem"> <a href="./doc-uimaducc-demo.html">..Demo Page</a> |
| </div> |
| <div class="navBarItem"> <a href="http://uima-ducc-demo.apache.org:42133" target="_blank" rel="noopener">..Demo Live <img src="images/offsitelink.png"/></a> |
| </div> |
| </div> |
| <br/> |
| <div class="navBarItem"> <div class="navPartHeading">Components & Tools</div> |
| </div> |
| <div class="navBar"> |
| <div class="navBarItem"> <a href="./sandbox.html#uima-addons-annotators">Annotators</a> |
| </div> |
| <div class="navBarItem"> <a href="./toolsServers.html">Tools & Servers</a> |
| </div> |
| <div class="navBarItem"> <a href="./sandbox.html">Addons and Sandbox</a> |
| </div> |
| <div class="navBarItem"> <a href="./ruta.html">UIMA Ruta</a> |
| </div> |
| <div class="navBarItem"> <a href="./uimafit.html">uimaFIT</a> |
| </div> |
| <div class="navBarItem"> <a href="./external-resources.html">External Resources</a> |
| </div> |
| </div> |
| <br/> |
| <div class="navBarItem"> <div class="navPartHeading">Development</div> |
| </div> |
| <div class="navBar"> |
| <div class="navBarItem"> <a href="./dev-quick.html">Quick Start: building</a> |
| </div> |
| <div class="navBarItem"> <a href="./building-uima.html">Building from Source</a> |
| </div> |
| <div class="navBarItem"> <a href="./one-time-setup.html">One-time setups</a> |
| </div> |
| <div class="navBarItem"> <a href="./svn.html">Source Code</a> |
| </div> |
| <div class="navBarItem"> <a href="./release.html">Doing a UIMA release</a> |
| </div> |
| <div class="navBarItem"> <a href="https://www.apache.org/security/committers.html" target="_blank" rel="noopener">Doing a CVE (Apache) <img src="images/offsitelink.png"/></a> |
| </div> |
| <div class="navBarItem"> <a href="./eclipse-update-site.html">Eclipse Update Sites</a> |
| </div> |
| <div class="navBarItem"> <a href="./git.html">GIT</a> |
| </div> |
| <div class="navBarItem"> <a href="./codeConventions.html">Code Conventions</a> |
| </div> |
| <div class="navBarItem"> <a href="./uima-specification.html">UIMA Specification (OASIS)</a> |
| </div> |
| <div class="navBarItem"> <a href="./team-list.html">Project Team</a> |
| </div> |
| <div class="navBarItem"> <a href="./maven-design.html">Maven Use</a> |
| </div> |
| <div class="navBarItem"> <a href="./updating-website.html">Updating this Website</a> |
| </div> |
| </div> |
| <br/> |
| <div class="navBarItem"> <div class="navPartHeading">Events and Conferences</div> |
| </div> |
| <div class="navBar"> |
| <div class="navBarItem"> <a href="./coling14.html">COLING 2014</a> |
| </div> |
| <div class="navBarItem"> <a href="./gscl13.html">GSCL 2013</a> |
| </div> |
| <div class="navBarItem"> <a href="./iks09.html">IKS 2009</a> |
| </div> |
| <div class="navBarItem"> <a href="./gscl09.html">GSCL 2009</a> |
| </div> |
| <div class="navBarItem"> <a href="./lsm09.html">LSM 2009</a> |
| </div> |
| <div class="navBarItem"> <a href="./lrec08.html">LREC 2008</a> |
| </div> |
| <div class="navBarItem"> <a href="./gldv07.html">GLDV 2007</a> |
| </div> |
| </div> |
| <br/> |
| <div class="navBarItem"> <div class="navPartHeading">ASF</div> |
| </div> |
| <div class="navBar"> |
| <div class="navBarItem"> <a href="https://www.apache.org/licenses/" target="_blank" rel="noopener">License <img src="images/offsitelink.png"/></a> |
| </div> |
| <div class="navBarItem"> <a href="https://www.apache.org/foundation/thanks.html" target="_blank" rel="noopener">ASF Sponsors <img src="images/offsitelink.png"/></a> |
| </div> |
| <div class="navBarItem"> <a href="https://www.apache.org/foundation/sponsorship.html" target="_blank" rel="noopener">ASF Sponsorship <img src="images/offsitelink.png"/></a> |
| </div> |
| <div class="navBarItem"> <a href="./security_report">Security</a> |
| </div> |
| </div> |
| </div> |
| </td> |
| <td width="80%" align="left" valign="top"> |
| <div class="sectionTable"> |
| <table class="sectionTable"> |
| <tr><td> |
| <a name="Apache UIMA Addons and Sandbox"><h1><img src="images/UIMA_4sq50tightCropSolid.png"/> Apache UIMA Addons and Sandbox</h1></a> |
| </td></tr> |
| <tr><td> |
| <blockquote class="sectionBody"> |
| <p> |
| The Apache UIMA™ Sandbox is a workspace that is open to all UIMA committers and developers who would like to |
| contribute code and join the UIMA developer community. |
| </p> |
| <p>Components often start in the Sandbox and, when ready for release, migrate from here to the Addons or other parts of the site, over time, as part of the |
| process of integration by the Apache community. |
| </p> |
| <p>The Addons and Sandbox currently host analysis components |
| and tooling around UIMA. All the components are free to use and licensed under the |
| <a href="license.html">Apache Software License</a>. |
| A list of proposed analysis components and tooling for UIMA is available at the |
| <a href="https://cwiki.apache.org/confluence/display/UIMA/uima-sandbox-components.html">UIMA wiki</a> and can be discussed there. |
| </p> |
| <p> |
| You can access the UIMA Addons in the SVN repository at |
| <a class="external" href="https://svn.apache.org/repos/asf/uima/addons/trunk/"> |
| https://svn.apache.org/repos/asf/uima/addons/trunk/</a>. |
| Likewise, you can access the UIMA sandbox in the SVN repository at |
| <a class="external" href="https://svn.apache.org/repos/asf/uima/sandbox/trunk/"> |
| https://svn.apache.org/repos/asf/uima/sandbox/trunk/</a>. |
| </p> |
| <p> |
| The list below shows the currently available components of the UIMA Addons. |
| Many of these components are annotators. The Addons projects are released - see the |
| <a href="downloads.cgi">download</a> page. |
| </p> |
| </blockquote> |
| </p> |
| </td></tr> |
| </table> |
| <div class="sectionTable"> |
| <table class="sectionTable"> |
| <tr><td> |
| <a name="UIMA Addons components"><h1><img src="images/UIMA_4sq50tightCropSolid.png"/> UIMA Addons components</h1></a> |
| </td></tr> |
| <tr><td> |
| <blockquote class="sectionBody"> |
| <h4 id="uima-addons-annotators">Annotators and Consumers</h4> |
| <ul> |
| <li><a href="#whitespace.tokenizer">Whitespace Tokenizer Annotator</a></li> |
| <li><a href="#snowball.annotator">Snowball Annotator</a></li> |
| <li><a href="#regex.annotator">Regular Expression Annotator</a></li> |
| <li><a href="#dict.annotator">Dictionary Annotator</a></li> |
| <li><a href="#tagger.annotator">Hidden Markov Model Tagger Annotator</a></li> |
| <li><a href="#bsf.annotator">BSF Annotator</a></li> |
| <li><a href="#opencalais.annotator">OpenCalais Annotator</a></li> |
| <li><a href="#concept.mapper.annotator">Concept Mapper Annotator</a></li> |
| <li><a href="#configurable.feature.extractor.annotator">Configurable Feature Extractor Annotator</a></li> <li><a href="#tika.annotator">Tika Annotator</a></li> |
| <li><a href="#lucas.consumer">Lucene CAS indexer (Lucas)</a></li> |
| <li><a href="#alchemy.annotator">AlchemyAPI Annotator</a></li> |
| <li><a href="#solrcas.consumer">Solr CAS Consumer (Solrcas)</a></li> |
| </ul> |
| <h4 id="uima-addons-servers">Servers</h4> |
| <ul> |
| <li><a href="#simple-server">Simple Server (UIMA REST service)</a></li> |
| </ul> |
| <h4>Packaging tools</h4> |
| <ul> |
| <li><a href="#pear.package.task">PEAR Packaging ANT Task</a></li> |
| <li><a href="#pear.maven.task">PEAR Packaging Maven Plugin</a></li> |
| </ul> |
| <h4>Miscellaneous</h4> |
| <ul> |
| <li><a href="#fs.variables">Feature Structure Variables</a></li> |
| </ul> |
| <p>These are described in more detail below.</p> |
| <br /> |
| <table class="subsectionTable" id='whitespace.tokenizer'> |
| <tr><td> |
| |
| |
| |
| <a name="Whitespace Tokenizer Annotator"> |
| <h2>Whitespace Tokenizer Annotator |
| </h2> |
| </a> |
| </td></tr> |
| <tr><td> |
| <blockquote class="subsectionBody"> |
| <p> |
| The Whitespace tokenizer annotator component provides an UIMA annotator implementation that tokenizes |
| text documents using a simple whitespace segmentation. During the tokenization, the annotator creates |
| token and sentence annotations as result. The Java source of the annotator |
| can be accessed in the SVN repository at |
| <a class="external" href="https://svn.apache.org/repos/asf/uima/addons/trunk/WhitespaceTokenizer"> |
| https://svn.apache.org/repos/asf/uima/addons/trunk/WhitespaceTokenizer</a>. |
| </p> |
| </blockquote> |
| </td></tr> |
| </table> |
| <table class="subsectionTable" id='snowball.annotator'> |
| <tr><td> |
| |
| |
| |
| <a name="Snowball Annotator"> |
| <h2>Snowball Annotator |
| </h2> |
| </a> |
| </td></tr> |
| <tr><td> |
| <blockquote class="subsectionBody"> |
| <p> |
| The Snowball annotator is an UIMA annotator component that wraps the Snowball stemming algorithm. The annotator |
| iterates over the available token annotations in the CAS and creates for each token a feature |
| containing the stem. |
| The stemming algorithm is avaialble for several languages. For details about Snowball please see |
| <a class="external" href="https://snowball.tartarus.org/">https://snowball.tartarus.org/</a>. |
| The Java source of the annotator can be accessed in the SVN repository at |
| <a class="external" href="https://svn.apache.org/repos/asf/uima/addons/trunk/SnowballAnnotator"> |
| https://svn.apache.org/repos/asf/uima/addons/trunk/SnowballAnnotator</a>. |
| </p> |
| <p> |
| Note: the used implementation of the Snowball stemming algorithm is licensed under the BSD license. |
| </p> |
| </blockquote> |
| </td></tr> |
| </table> |
| <table class="subsectionTable" id='regex.annotator'> |
| <tr><td> |
| |
| |
| |
| <a name="Regular Expression Annotator"> |
| <h2>Regular Expression Annotator |
| </h2> |
| </a> |
| </td></tr> |
| <tr><td> |
| <blockquote class="subsectionBody"> |
| <p> |
| The Regular Expression Annotator (RegexAnnotator) is an Apache UIMA analysis engine that |
| detects entities like email addresses, URLs, phone numbers, zip codes or any other entity |
| based on regular expressions and concepts. For each entity that was detected an annotation |
| can be created or an already existing annotation can be updated with feature values. |
| <a href="d/uima-addons-current/RegularExpressionAnnotator/RegexAnnotatorUserGuide.html"> |
| Click here to access the user documentation</a>. |
| The Java source of the annotator can be accessed in the SVN repository at |
| <a class="external" href="https://svn.apache.org/repos/asf/uima/addons/trunk/RegularExpressionAnnotator"> |
| https://svn.apache.org/repos/asf/uima/addons/trunk/RegularExpressionAnnotator</a>. |
| </p> |
| </blockquote> |
| </td></tr> |
| </table> |
| <table class="subsectionTable" id='pear.package.task'> |
| <tr><td> |
| |
| |
| |
| <a name="PEAR Packaging ANT Task"> |
| <h2>PEAR Packaging ANT Task |
| </h2> |
| </a> |
| </td></tr> |
| <tr><td> |
| <blockquote class="subsectionBody"> |
| <p> |
| The PEAR packaging ANT task component is a project to create UIMA PEAR packages automatically |
| during a component build using a custom |
| <a class="external" href="https://ant.apache.org/">Apache ANT</a> task. With this task, |
| users are able to build their components from the source and then package them |
| automatically as UIMA PEAR package. |
| <a href="d/uima-addons-current/PearPackagingAntTask/PearPackagingAntTaskUserGuide.html"> |
| Click here to access the user documentation</a>. |
| The Java source of the PEAR packaging task can be accessed in the SVN repository at |
| <a class="external" href="https://svn.apache.org/repos/asf/uima/addons/trunk/PearPackagingAntTask"> |
| https://svn.apache.org/repos/asf/uima/addons/trunk/PearPackagingAntTask</a>. |
| </p> |
| </blockquote> |
| </td></tr> |
| </table> |
| <table class="subsectionTable" id='pear.maven.task'> |
| <tr><td> |
| |
| |
| |
| <a name="PEAR Packaging Maven Plugin"> |
| <h2>PEAR Packaging Maven Plugin |
| </h2> |
| </a> |
| </td></tr> |
| <tr><td> |
| <blockquote class="subsectionBody"> |
| <p>Note: The PEAR Packaging Maven Plugin has been moved to the main UIMA Java Framework and SDK package.</p> |
| <p> |
| The PEAR packaging Maven plugin component is a project to create UIMA PEAR packages automatically |
| during a component build using a custom Maven plugin. |
| With this plugin, users are able to build their components from the source and then package them |
| automatically as UIMA PEAR package. |
| <a href="d/uimaj-current/tools.html#ugr.tools.pear"> |
| Click here to access the user documentation</a>. |
| The Java source of the PEAR packaging Maven plugin can be accessed in the SVN repository at |
| <a class="external" href="https://svn.apache.org/repos/asf/uima/uimaj/trunk/PearPackagingMavenPlugin"> |
| https://svn.apache.org/repos/asf/uima/uimaj/trunk/PearPackagingMavenPlugin</a>. |
| </p> |
| </blockquote> |
| </td></tr> |
| </table> |
| <table class="subsectionTable" id='dict.annotator'> |
| <tr><td> |
| |
| |
| |
| <a name="Dictionary Annotator"> |
| <h2>Dictionary Annotator |
| </h2> |
| </a> |
| </td></tr> |
| <tr><td> |
| <blockquote class="subsectionBody"> |
| <p> |
| The Dictionary Annotator is an Apache UIMA analysis engine that creates annotations based on word lists |
| that are compiled to simple dictionaries. The output annotation type for the annotations that are created |
| and the input annotation type where the dictionary lookup is executed on, can be specified individually. |
| <a href="d/uima-addons-current/DictionaryAnnotator/DictionaryAnnotatorUserGuide.html"> |
| Click here to access the user documentation</a>. |
| The Java source of the annotator can be accessed in the SVN repository at |
| <a class="external" href="https://svn.apache.org/repos/asf/uima/addons/trunk/DictionaryAnnotator"> |
| https://svn.apache.org/repos/asf/uima/addons/trunk/DictionaryAnnotator</a>. |
| </p> |
| </blockquote> |
| </td></tr> |
| </table> |
| <table class="subsectionTable" id='fs.variables'> |
| <tr><td> |
| |
| |
| |
| <a name="Feature Structure Variables"> |
| <h2>Feature Structure Variables |
| </h2> |
| </a> |
| </td></tr> |
| <tr><td> |
| <blockquote class="subsectionBody"> |
| <p> |
| The Feature Structure variables project allows you to create named feature structure instances. |
| It further allows you to refer to individual feature structures or annotations across annotators, |
| without creating a special index. |
| <a href="d/uima-addons-current/FsVariables/fsVariablesUserGuide.html"> |
| Click here to access the user documentation</a>. |
| The Java source of the project can be accessed in the SVN repository at |
| <a class="external" href="https://svn.apache.org/repos/asf/uima/addons/trunk/FsVariables"> |
| https://svn.apache.org/repos/asf/uima/addons/trunk/FsVariables</a>. |
| </p> |
| </blockquote> |
| </td></tr> |
| </table> |
| <table class="subsectionTable" id='tagger.annotator'> |
| <tr><td> |
| |
| |
| |
| <a name="Hidden Markov Model Tagger Annotator"> |
| <h2>Hidden Markov Model Tagger Annotator |
| </h2> |
| </a> |
| </td></tr> |
| <tr><td> |
| <blockquote class="subsectionBody"> |
| <p> |
| The Tagger Annotator component implements a Hidden Markov Model (HMM) tagger. The tagger assumes that |
| sentences and tokens have already been annotated in the CAS with sentence and token annotations. |
| It iterates then in turn over sentences and tokens to accumulate a list of words, and then invokes the |
| tagger on this list. The HMM tagger employs the Viterbi algorithm to calculate the most probable tag sequence. |
| For each Token it updates the posTag field with the part of speech tag. |
| Model training is happening outside of UIMA, the tagger just receives statistical information from |
| a model file which is passed to the tagger along with some further parameters through a properties file. |
| <a href="d/uima-addons-current/Tagger/hmmTaggerUsersGuide.html"> |
| Click here to access the user documentation</a>. |
| The Java source of the annotator can be accessed in the SVN repository at |
| <a class="external" href="https://svn.apache.org/repos/asf/uima/addons/trunk/Tagger"> |
| https://svn.apache.org/repos/asf/uima/addons/trunk/Tagger</a>. |
| </p> |
| </blockquote> |
| </td></tr> |
| </table> |
| <table class="subsectionTable" id='bsf.annotator'> |
| <tr><td> |
| |
| |
| |
| <a name="BSF Annotator"> |
| <h2>BSF Annotator |
| </h2> |
| </a> |
| </td></tr> |
| <tr><td> |
| <blockquote class="subsectionBody"> |
| <p> |
| The Bean Scripting Framework (BSF) Annotator is an Apache UIMA analysis engine that provides |
| a link between the UIMA framework and the scripting languages that are supported by |
| Apache BSF (<a class="external" href="https://jakarta.apache.org/bsf">https://jakarta.apache.org/bsf</a>). |
| The current implementation comes with examples in Beanshell |
| (<a class="external" href="https://www.beanshell.org">https://www.beanshell.org</a>) and Rhino Javascript |
| (<a class="external" href="https://www.mozilla.org/rhino">https://www.mozilla.org/rhino</a>). |
| Simple tests have also been conducted successfully with Jython |
| (<a class="external" href="https://jython.sourceforge.net/Project/index.html">https://jython.sourceforge.net/Project/index.html</a>) |
| and JRuby (<a class="external" href="https://jruby.codehaus.org">https://jruby.codehaus.org</a>). |
| The annotator takes as parameter the source file containing the script. |
| The script is supposed to implement the initialize and process functions of the analysis engine. |
| Using a scripting language can be very handy to do quick prototyping, pre/post processing, CAS cleaning tasks or |
| typeystem conversion/adaptation. |
| The Java source of the annotator can be accessed from the SVN repository at |
| <a class="external" href="https://svn.apache.org/repos/asf/uima/addons/trunk/BSFAnnotator"> |
| https://svn.apache.org/repos/asf/uima/addons/trunk/BSFAnnotator</a>. |
| </p> |
| </blockquote> |
| </td></tr> |
| </table> |
| <table class="subsectionTable" id='tika.annotator'> |
| <tr><td> |
| |
| |
| |
| <a name="Tika Annotator"> |
| <h2>Tika Annotator |
| </h2> |
| </a> |
| </td></tr> |
| <tr><td> |
| <blockquote class="subsectionBody"> |
| <p> |
| Apache Tika is a toolkit for detecting and extracting metadata and |
| structured text content from various documents using existing parser |
| libraries. The TikaAnnotator uses |
| <a href="https://lucene.apache.org/tika/" target="_blank" rel="noopener">Tika</a> |
| to generate annotations representing |
| the original markup of a document, extract its text and metadata. It |
| consists of three resources: |
| </p> |
| <dl> |
| <dt>FileSystemCollectionReader</dt> |
| <dd>similar to the one in UIMA examples but uses |
| TIKA to extract the text from binary documents and generates annotations to |
| represent the markup</dd> |
| |
| <dt>MarkupAnnotator</dt> |
| <dd>takes the original content from a view and generates a |
| new view containing the extracted text with markup annotations</dd> |
| |
| <dt>TikaWrapper</dt> |
| <dd>utility class which allows to populate a CAS from a binary |
| document; used by the FileSystemCollectionReader</dd> |
| </dl> |
| </blockquote> |
| </td></tr> |
| </table> |
| <table class="subsectionTable" id='lucas.consumer'> |
| <tr><td> |
| |
| |
| |
| <a name="Lucene CAS indexer (Lucas)"> |
| <h2>Lucene CAS indexer (Lucas) |
| </h2> |
| </a> |
| </td></tr> |
| <tr><td> |
| <blockquote class="subsectionBody"> |
| <p> |
| The Lucene CAS indexer (Lucas) is a UIMA CAS consumer that stores CAS |
| data in a <a href="https://lucene.apache.org">Lucene</a> index. The consumer |
| transforms annotation objects of a CAS into Lucene token streams |
| which are stored in a Lucene document. Token streams can further be processed |
| by token filters. Lucas comes with a set of its own token filters and |
| integrations for some Lucene token filters. Furthermore, you can |
| deploy your own token filters. The mapping between UIMA annotations and Lucene |
| tokens and token filtering is configured by a xml mapping file.</p> |
| <p><a href="d/uima-addons-current/Lucas/LuceneCASConsumerUserGuide.html"> |
| Click here to access the user documentation</a>. |
| The Java source of the consumer can be accessed in the |
| <a href="https://svn.apache.org/repos/asf/uima/addons/trunk/Lucas"> |
| SVN repository</a>. |
| </p> |
| </blockquote> |
| </td></tr> |
| </table> |
| <table class="subsectionTable" id='simple-server'> |
| <tr><td> |
| |
| |
| |
| <a name="Simple Server (UIMA REST Service)"> |
| <h2>Simple Server (UIMA REST Service) |
| </h2> |
| </a> |
| </td></tr> |
| <tr><td> |
| <blockquote class="subsectionBody"> |
| <p> |
| The UIMA Simple Server makes results of UIMA processing |
| available in a simple, XML-based format. The intended use of |
| the the Simple Server is to provide UIMA analysis as a REST |
| service. The Simple Server is implemented as a Java Servlet, |
| and can be deployed into any Servlet container (such as |
| Apache Tomcat or Jetty).</p> |
| <p><a href="d/uima-addons-current/SimpleServer/simpleServerUserGuide.html"> |
| Click here to access the user documentation</a>. |
| The Java source of the annotator can be accessed from the |
| SVN repository at |
| <a class="external" href="https://svn.apache.org/repos/asf/uima/addons/trunk/SimpleServer"> |
| https://svn.apache.org/repos/asf/uima/addons/trunk/SimpleServer |
| </a>. |
| </p> |
| </blockquote> |
| </td></tr> |
| </table> |
| <table class="subsectionTable" id='opencalais.annotator'> |
| <tr><td> |
| |
| |
| |
| <a name="OpenCalais Annotator"> |
| <h2>OpenCalais Annotator |
| </h2> |
| </a> |
| </td></tr> |
| <tr><td> |
| <blockquote class="subsectionBody"> |
| <p> |
| The OpenCalais Annotator component wraps the |
| <a class="external" href="https://www.opencalais.com">OpenCalais</a> |
| web service and makes the OpenCalais analysis results available in UIMA. OpenCalais can detect a large variety |
| of entities, facts and events like for example Persons, Companies, Acquisitions, Mergers, etc. |
| For details about the OpenCalais analytics and the license to use the service, please refer to the |
| to the <a class="external" href="https://www.opencalais.com">OpenCalais</a> website. |
| The Java source of the annotator can be accessed in the SVN repository at |
| <a class="external" href="https://svn.apache.org/repos/asf/uima/addons/trunk/OpenCalaisAnnotator"> |
| https://svn.apache.org/repos/asf/uima/addons/trunk/OpenCalaisAnnotator</a>. |
| </p> |
| </blockquote> |
| </td></tr> |
| </table> |
| <table class="subsectionTable" id='concept.mapper.annotator'> |
| <tr><td> |
| |
| |
| |
| <a name="Concept Mapper Annotator"> |
| <h2>Concept Mapper Annotator |
| </h2> |
| </a> |
| </td></tr> |
| <tr><td> |
| <blockquote class="subsectionBody"> |
| <p> |
| ConceptMapper is a powerful, highly configurable dictionary UIMA-based annotator. |
| </p> |
| <p> |
| Numerous parameters can be used to specify various aspects of the lookup algorithm, input processing and output options. |
| The dictionary structure is flexible, allowing any number synonyms to be associated with an entry, |
| and any number of attributes to be associated with entries or synonyms. |
| </p> |
| <p>ConceptMapper is separately released, and available on the downloads page.</p> |
| <p> |
| Lookup and matching against dictionary entries can be performed against |
| contiguous or non-contiguous blocks of text, and token order independent |
| lookup is also allowed (for example, the tokens "A" "B" would be considered |
| a match against dictionary entry "B" "A"). |
| </p> |
| <p> |
| Additionally, ConceptMapper can be configured to use any tokenizer annotator, |
| enabling tokenization of the dictionary identically with the input text. |
| </p> |
| <p><a href="d/uima-addons-current/ConceptMapper/ConceptMapperAnnotatorUserGuide.html"> |
| Click here to access the user documentation</a>.</p> |
| </blockquote> |
| </td></tr> |
| </table> |
| <table class="subsectionTable" id='configurable.feature.extractor.annotator'> |
| <tr><td> |
| |
| |
| |
| <a name="Configurable Feature Extractor Annotator"> |
| <h2>Configurable Feature Extractor Annotator |
| </h2> |
| </a> |
| </td></tr> |
| <tr><td> |
| <blockquote class="subsectionBody"> |
| <p> |
| The Configurable Feature Extractor (CFE) Annotator is a multipurpose tool |
| that enables feature extraction from a UIMA CAS in a very generalized and |
| application independent way. </p> |
| <p>The extraction process is performed according to rules expressed using the |
| <bold>Feature Extraction Specification Language (FESL)</bold> that are stored |
| in configuration files.</p> |
| <p>Using CFE eliminates the need for creating customized CAS consumers and |
| writing Java code for every application. Instead, by using FESL rules in XML format, |
| users can customize the information extraction process to suit their application. |
| FESL's rule semantics allow the precise identification of the information that |
| is required to be extracted by specifying precise multi-parameter criteria. |
| </p> |
| <p><a href="d/uima-addons-current/ConfigurableFeatureExtractor/CFE_UG.html"> |
| Click here to access the user documentation</a>.</p> |
| </blockquote> |
| </td></tr> |
| </table> |
| <table class="subsectionTable" id='alchemy.annotator'> |
| <tr><td> |
| |
| |
| |
| <a name="AlchemyAPI Annotator"> |
| <h2>AlchemyAPI Annotator |
| </h2> |
| </a> |
| </td></tr> |
| <tr><td> |
| <blockquote class="subsectionBody"> |
| <p>The AlchemyAPI Annotator is a wrapper for the AlchemyAPI webservices which |
| provide text enrichment facilities like categorization, entity extraction, |
| language identification, keyword extraction, concept tagging etc. |
| </p> |
| <p><a href="d/uima-addons-current/AlchemyAPIAnnotator/AlchemyAPIAnnotatorUserGuide.html"> |
| Click here to access the user documentation</a>.</p> |
| </blockquote> |
| </td></tr> |
| </table> |
| <table class="subsectionTable" id='solrcas.consumer'> |
| <tr><td> |
| |
| |
| |
| <a name="Solr CAS Consumer (Solrcas)"> |
| <h2>Solr CAS Consumer (Solrcas) |
| </h2> |
| </a> |
| </td></tr> |
| <tr><td> |
| <blockquote class="subsectionBody"> |
| <p> The Solr CAS Consumer (Solrcas) consumes CAS objects transforming |
| them into Solr documents to write to a remote or local Solr instance |
| in order to provide serach capabilities on top of UIMA pipelines with |
| the Apache Solr search server. |
| </p> |
| <p><a href="d/uima-addons-current/Solrcas/SolrcasUserGuide.html"> |
| Click here to access the user documentation</a>.</p> |
| </blockquote> |
| </td></tr> |
| </table> |
| </blockquote> |
| </p> |
| </td></tr> |
| </table> |
| <div class="sectionTable"> |
| <table class="sectionTable"> |
| <tr><td> |
| <a name="UIMA Sandbox components"><h1><img src="images/UIMA_4sq50tightCropSolid.png"/> UIMA Sandbox components</h1></a> |
| </td></tr> |
| <tr><td> |
| <blockquote class="sectionBody"> |
| <p>These components are partially only available in SVN at this time.</p> |
| <h4>Annotators and Consumers</h4> |
| <ul> |
| <li><a href="#rdfcas.consumer">RDF CAS Consumer</a></li> |
| </ul> |
| <h4>Miscellaneous</h4> |
| <ul> |
| <li><a href="#gale.multimodal.example">GALE Multi-Modal Example</a></li> |
| </ul> |
| <p>These are described in more detail below.</p> |
| <br /> |
| <table class="subsectionTable" id='rdfcas.consumer'> |
| <tr><td> |
| |
| |
| |
| <a name="RDF CAS Consumer"> |
| <h2>RDF CAS Consumer |
| </h2> |
| </a> |
| </td></tr> |
| <tr><td> |
| <blockquote class="subsectionBody"> |
| <p>The RDF CAS Consumer is responsible of taking a CAS view and |
| write it to a file in a RDF format; this is usefult to plug UIMA |
| pipelines with RDF backed systems (using ontologies, reasoners, etc.). |
| </p> |
| </blockquote> |
| </td></tr> |
| </table> |
| <table class="subsectionTable" id='gale.multimodal.example'> |
| <tr><td> |
| |
| |
| |
| <a name="GALE Multi-Modal Example"> |
| <h2>GALE Multi-Modal Example |
| </h2> |
| </a> |
| </td></tr> |
| <tr><td> |
| <blockquote class="subsectionBody"> |
| <p> The GALE Multi-Modal Example contains a type-system and sample code based on a |
| rich multimodal application developed under the Darpa GALE project to demonstrate how to combine |
| analytics from multiple sources and modalities. The GALE Type System (GTS) has been designed |
| for applications that combine analytics from multiple sources and modalities, such as speech |
| recognition, language translation, entity detection, topic detection, speech synthesis, etc. |
| </p> |
| <p> |
| The sample code will illustrate how to wrap NLP analytics as UIMA annotators using |
| appropriate GTS types, as well as data-reorganization components that convert the output of each |
| analytic into a form suitable for the following analytics, and add |
| cross-reference links back to the original data. |
| </p> |
| <p> The type system descriptors can be accessed from the SVN repository at |
| <a class="external" href="https://svn.apache.org/repos/asf/uima/sandbox/trunk/GaleMultiModalExample"> |
| https://svn.apache.org/repos/asf/uima/sandbox/trunk/GaleMultiModalExample |
| </a>. |
| </p> |
| </blockquote> |
| </td></tr> |
| </table> |
| </blockquote> |
| </p> |
| </td></tr> |
| </table> |
| </td> |
| </tr> |
| <!-- FOOTER --> |
| <tr><td colspan="2"> |
| <hr noshade="" size="1"/> |
| </td></tr> |
| <tr><td colspan="2"> |
| <table class="pageFooter"> |
| <tr> |
| <td><a href="index.html">Home</a></td> |
| <td><a href="privacy-policy.html">Privacy Policy</a></td> |
| <td style="font-size:75%"> |
| Copyright © 2006-2013, The Apache Software Foundation.<br/> |
| Apache UIMA, UIMA, the Apache UIMA logo and the Apache Feather logo are trademarks of The Apache Software Foundation.<br/> |
| All other marks mentioned may be trademarks or registered trademarks of their respective owners. |
| </td> |
| <td><a href="mailto:dev@uima.apache.org">Contact us</a></td> |
| </tr> |
| </table> |
| </td></tr> |
| </table> |
| </body> |
| </html> |
| |