id: org.apache.streampipes.processors.transformation.flink.processor.boilerplate title: Boilerplate Removal sidebar_label: Boilerplate Removal


Description

Removes boilerplate tags from HTML and extracts fulltext


Required input

Requires a Text field containing the HTML


Configuration

Select the extractor type and output mode

Output

Appends a new text field containing the content of the html page without the boilerplate