| <?xml version="1.0" encoding="UTF-8"?> |
| <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" |
| "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"[ |
| <!ENTITY imgroot "../images/tools/tools.doc_analyzer/" > |
| <!ENTITY % uimaents SYSTEM "../entities.ent" > |
| %uimaents; |
| ]> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| <chapter id="ugr.tools.doc_analyzer"> |
| <title>Document Analyzer User's Guide</title> |
| |
| |
| <para>The <emphasis>Document Analyzer</emphasis> is a tool provided by the |
| UIMA SDK for testing annotators and AEs. It reads text files from your disk, processes them using an AE, and |
| allows you to view the results. The |
| Document Analyzer is designed to work with text files and cannot be used with |
| Analysis Engines that process other types of data.</para> |
| |
| <para>For an introduction to developing annotators and Analysis |
| Engines, read |
| <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aae"/>. |
| This chapter is a user's guide for using the Document Analyzer tool, and |
| does not describe the process of developing annotators and Analysis Engines.</para> |
| |
| <section id="ugr.tools.doc_analyzer.starting"> |
| <title>Starting the Document Analyzer</title> |
| |
| <para>To run the Document Analyzer, execute the <literal>documentAnalyzer</literal> script that is in the <literal>bin</literal> directory of your UIMA SDK installation, or, if you |
| are using the example Eclipse project, execute the <quote>UIMA Document Analyzer</quote> |
| run configuration supplied with that project.</para> |
| |
| <para>Note that if you're planning to run an Analysis Engine |
| other than one of the examples included in the UIMA SDK, you'll first need to |
| update your CLASSPATH environment variable to include the classes needed by |
| that Analysis Engine.</para> |
| |
| <para>When you first run the Document Analyzer, you should see a |
| screen that looks like this: |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="5.8in" format="JPG" fileref="&imgroot;image002.jpg"/> |
| </imageobject> |
| <textobject><phrase>Document Analyzer GUI</phrase> |
| </textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| |
| </section> |
| |
| <section id="ugr.tools.doc_analyzer.running_an_ae"> |
| <title>Running an AE</title> |
| |
| |
| |
| <para>To run a AE, you must first configure the six fields on |
| the main screen of the Document Analyzer.</para> |
| |
| <para><emphasis role="bold">Input Directory:</emphasis> |
| Browse to or type the path of a directory containing text files that you |
| want to analyze. Some sample documents |
| are provided in the UIMA SDK under the <literal>examples/data</literal> |
| directory.</para> |
| |
| <para><emphasis role="bold">Output Directory:</emphasis> Browse to or type the path of a directory where you want |
| output to be written. (As we'll see later, you won't normally need to look directly at these files, but the |
| Document Analyzer needs to know where to write them.) The files written to this directory will be an XML |
| representation of the analyzed documents. If this directory doesn't exist, it will be created. If the |
| directory exists, any files in it will be deleted (but the tool will ask you to confirm this before doing so). If you |
| leave this field blank, your AE will be run but no output will be generated.</para> |
| |
| <para><emphasis role="bold">Location of AE XML Descriptor:</emphasis> |
| Browse to or type the path of the descriptor |
| for the AE that you want to run. There |
| are some example descriptors provided in the UIMA SDK under the <literal>examples/descriptors/analysis_engine</literal> and <literal>examples/descriptors/tutorial</literal> directories.</para> |
| |
| <para><emphasis role="bold">XML Tag containing Text:</emphasis> |
| This is an optional feature. If you enter a value here, it specifies the |
| name of an XML tag, expected to be found within the input documents, that |
| contains the text to be analyzed. For |
| example, the value <literal>TEXT</literal> would cause the AE to only |
| analyze the portion of the document enclosed within <TEXT>...</TEXT> |
| tags. Also, any XML tags occuring within that text will be removed prior to analysis.</para> |
| |
| <para><emphasis role="bold">Language:</emphasis> |
| Specify |
| the language in which the documents are written. Some Analysis Engines, but not all, require |
| that this be set correctly in order to do their analysis. You can select a value from the drop-down |
| list or type your own. The value entered |
| here must be an ISO language identifier, the list of which can be found here: |
| <ulink url="http://www.ics.uci.edu/pub/ietf/http/related/iso639.txt"/>. |
| </para> |
| |
| <para><emphasis role="bold">Character Encoding:</emphasis> |
| The character encoding of the input files. The default, UTF-8, also works fine for ASCII |
| text files. If you have a different |
| encoding, enter it here. For more |
| information on character sets and their names, see the Javadocs for |
| <literal>java.nio.charset.Charset</literal>.</para> |
| |
| <para>Once you've filled in the appropriate values, press the |
| <quote>Run</quote> button.</para> |
| |
| <para>If an error occurs, a dialog will appear with the error |
| message. (A stack trace will also be |
| printed to the console, which may help you if the error was generated by your |
| own annotator code.) Otherwise, an |
| <quote>Analysis Results</quote> window will appear.</para> |
| |
| |
| |
| </section> |
| |
| <section id="ugr.tools.doc_analyzer.viewing_results"> |
| <title>Viewing the Analysis Results</title> |
| |
| <para>After a successful analysis, the <quote>Analysis |
| Results</quote> window will appear. |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="4.2in" format="JPG" fileref="&imgroot;image004.jpg"/> |
| </imageobject> |
| <textobject><phrase>Analysis Results Window</phrase></textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| |
| <para>The <quote>Results Display Format</quote> options at the |
| bottom of this window show the different ways you can view your analysis – the |
| Java Viewer, Java Viewer (JV) with User Colors, HTML, and XML. |
| The default, Java Viewer, is recommended.</para> |
| |
| <para>Once you have selected your desired Results Display |
| Format, you can double-click on one of the files in the list to view the |
| analysis done on that file.</para> |
| |
| <para>For the Java viewer, the results display looks like this |
| (for the AE descriptor <literal>examples/descriptors/tutorial/ex4/MeetingDetectorAE.xml</literal>): |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="5.8in" format="JPG" fileref="&imgroot;image006.jpg"/> |
| </imageobject> |
| <textobject><phrase>Analysis Results Window showing results from tutorial example 4</phrase></textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| |
| <para>You can click the mouse on one of the highlighted |
| annotations to see a list of all its features in the frame on the right.</para> |
| |
| <para>If there are multiple annotation types in the view, you |
| can control which ones are selected by using the checkboxes in the legend, the |
| Select All button, or the Deselect All button.</para> |
| |
| <para>If you are viewing a CAS that contains multiple subjects |
| of analysis, then a selector will appear at the bottom right of the Annotation |
| Viewer window. This will allow you to |
| choose the Sofa that you wish to view. Note that only text Sofas containing a non-null document are available |
| for viewing.</para> |
| |
| </section> |
| |
| <section id="ugr.tools.doc_analyzer.configuring"> |
| <title>Configuring the Annotation Viewer</title> |
| |
| <para>The <quote>JV User Colors</quote> and the HTML viewer allow |
| you to specify exactly which colors are used to display each of your annotation |
| types. For the Java Viewer, you can also |
| specify which types should be initially selected, and you can hide types |
| entirely.</para> |
| |
| <para>To configure the viewer, click the <quote>Edit Style |
| Map</quote> button on the <quote>Analysis Results</quote> dialog. |
| You should see a dialog that looks like this: |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="5.8in" format="JPG" fileref="&imgroot;image008.jpg"/> |
| </imageobject> |
| <textobject><phrase>Configuring the Analysis Results Viewer</phrase></textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>To change the color assigned to a type, simply click on |
| the colored cell in the <quote>Background</quote> column for the type you wish to |
| edit. This will display a dialog that |
| allows you to choose the color. For the |
| HTML viewer only, you can also change the foreground color.</para> |
| |
| <para>If you would like the type to be initially checked |
| (selected) in the legend when the viewer is first launched, check the box in |
| the <quote>Checked</quote> column. If you |
| would like the type to never be shown in the viewer, click the box in the |
| <quote>Hidden</quote> column. These |
| settings only affect the Java Viewer, not the HTML view.</para> |
| |
| <para>When you are done editing, click the <quote>Save</quote> |
| button. This will save your choices to a |
| file in the same directory as your AE descriptor. From now on, when you view analysis results |
| produced by this AE using the <quote>JV User Colors</quote> or <quote>HTML</quote> |
| options, the viewer will be configured as you have specified.</para> |
| |
| </section> |
| |
| <section id="ugr.tools.doc_analyzer.interactive_mode"> |
| <title>Interactive Mode</title> |
| |
| |
| <para>Interactive Mode allows you to analyze text that you type |
| or cut-and-paste into the tool, rather than requiring that the documents be |
| stored as files.</para> |
| |
| <para>In the main Document Analyzer window, you can invoke |
| Interactive Mode by clicking the <quote>Interactive</quote> button instead of the |
| <quote>Run</quote> button. This will |
| display a dialog that looks like this: |
| |
| |
| <screenshot> |
| <mediaobject> |
| <imageobject> |
| <imagedata width="5.5in" format="JPG" fileref="&imgroot;image010.jpg"/> |
| </imageobject> |
| <textobject><phrase>Invoking Interactive Mode</phrase></textobject> |
| </mediaobject> |
| </screenshot></para> |
| |
| <para>You can type or cut-and-paste your text into this window, |
| then choose your Results Display Format and click the <quote>Analyze</quote> |
| button. Your AE will be run on the text |
| that you supplied and the results will be displayed as usual.</para> |
| |
| |
| </section> |
| |
| <section id="ugr.tools.doc_analyzer.view_mode"> |
| <title>View Mode</title> |
| |
| <para>If you have previously run a AE and saved its analysis |
| results, you can use the Document Analyzer's View mode to view those results, |
| without re-running your analysis. To do |
| this, on the main Document Analyzer window simply select the location of your |
| analyzed documents in the <quote>Output Directory</quote> dialog and click the |
| <quote>View</quote> button. You can then |
| view your analysis results as described in Section |
| <xref linkend="ugr.tools.doc_analyzer.viewing_results"/>.</para> |
| |
| </section> |
| </chapter> |
| |