blob: 13b3011283d2d4e73246c8d4b8a6e310b0a52094 [file] [log] [blame] [view]
---
id: org.apache.streampipes.processors.textmining.jvm.partofspeech
title: Part of Speech (English)
sidebar_label: Part of Speech (English)
---
<!--
~ Licensed to the Apache Software Foundation (ASF) under one or more
~ contributor license agreements. See the NOTICE file distributed with
~ this work for additional information regarding copyright ownership.
~ The ASF licenses this file to You under the Apache License, Version 2.0
~ (the "License"); you may not use this file except in compliance with
~ the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
~
-->
<p align="center">
<img src="/img/pipeline-elements/org.apache.streampipes.processors.textmining.jvm.partofspeech/icon.png" width="150px;" class="pe-image-documentation"/>
</p>
***
## Description
Takes in a stream of tokens and marks each token with a part-of-speech tag
The list of used suffixes can be found [here](https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html)
***
## Required input
A stream with a list property which contains the tokens.
***
## Configuration
Simply assign the correct output of the previous stream to the part of speech detector input.
To use this component you have to download or train an openNLP model:
https://opennlp.apache.org/models.html
## Output
Appends two list properties to the stream:
1. String list: The tag for each token
2. Double list: The confidence for each tag that it is indeed the given tag (between 0 and 1)
**Example:**
Input: `(tokens: ["Hi", "Joe"])`
Output: `(tokens: ["Hi", "Joe"], tags: ["UH", "NNP"], confidence: [0.82, 0.87])`