documentation/website/versioned_docs/version-0.65.0-pre-asf/org.apache.streampipes.processors.textmining.jvm.tokenizer/documentation.md

id: version-0.65.0-pre-asf-org.apache.streampipes.processors.textmining.jvm.tokenizer title: Tokenizer (English) sidebar_label: Tokenizer (English) original_id: org.apache.streampipes.processors.textmining.jvm.tokenizer

Segments a given text into Tokens (usually words, numbers, punctuations, ...). Works best with english text.

A stream with a string property which contains a text.

Simply assign the correct output of the previous stream to the tokenizer input.

Adds a list to the stream which contains all tokens of the corresponding text.

Example:

Input: (text: "Hi, how are you?")

Output: (text: "Hi, how are you?", tokens: ["Hi", ",", "how", "are", "you", "?"])