blob: 99f9414ae8b452167cb907f80379b5dc0ce5ef2c [file] [log] [blame]
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
[[_ugr.tools.uimafit.testing"]]
= Testing UIMA components
Writing tests without uimaFIT can be a laborious process that results in fragile tests that are very verbose and break easily when code is refactored. This page demonstrates how you can write tests that are both concise and robust. Here is an outline of how you might create a test for a UIMA component _without_ uimaFIT:
* write a descriptor file that configures your component appropriately for the test. This requires a minimum of 30-50 lines of XML.
* begin a test with 5-10 lines of code that instantiate the e.g. analysis engine.
* run the analysis engine against some text and test the contents of the CAS.
* repeat steps 1-3 for your next test usually by copying the descriptor file, renaming it, and changing e.g. configuration parameters.</para>
If you have gone through the pain of creating tests like these and then decided you should refactor your code, then you know how tedious it is to maintain them.
Instead of pasting variants of the setup code (see step 2) into other tests we began to create a library of utility methods that we could call which helped shorten our code. We extended these methods so that we could instantiate our components directly without a descriptor file. These utility methods became the initial core of uimaFIT.
== Examples
There are several examples that can be found in the _uimafit-examples_ module.
* There are a number of examples of unit tests in both the test suite for the _uimafit-core_ module and the _uimafit-examples_ module. In particular, there are some well-documented unit tests in the latter which can be found in `RoomNumberAnnotator1Test`.
* You can improve your testing strategy by introducing a `TestBase` class such as the one found in `ExamplesTestBase`. This class is intended as a super class for your other test classes and sets up a `JCas` that is always ready to use along with a `TypeSystemDescription` and a `TypePriorities`. An example test that subclasses from `ExamplesTestBase` is `RoomNumberAnnotator2Test`.
* Most analysis engines that you want to test will generally be downstream of many other components that add annotations to the CAS. These annotations will likely need to be in the CAS so that a downstream analysis engine will do something sensible. This poses a problem for tests because it may be undesirable to set up and run an entire pipeline every time you want to test a downstream analysis engine. Furthermore, such tests can become fragile in the face of behavior changes to upstream components. For this reason, it can be advantageous to serialize a CAS as an XMI file and use this as a starting point rather than running an entire pipeline. An example of this approach can be found in `XmiTest`.
== Tips & Tricks
The package <package>org.apache.uima.fit.testing</package> provides some utility classes that can be handy when writing tests for UIMA components. You may find the following suggestions useful:
* add a `TokenBuilder` to your `TestBase` class. An example of this can be found in `ComponentTestBase`. This makes it easy to add tokens and sentences to the CAS you are testing which is a common task for many tests.
* use a `JCasBuilder` to add text and annotations incrementally to a JCas instead of first setting the text and then adding all annotations.
* use a `CasDumpWriter` to write the CAS contents is a human readable format to a file or to the console. Compare this with a previously written and manually verified file to see if changes in the component result in changes of the components output.