blob: 856a70ac10754166e270cd1ef57c64008ba133fd [file] [log] [blame]
<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"></meta><title>GetHTMLElement</title><link rel="stylesheet" href="../../../../../css/component-usage.css" type="text/css"></link></head><script type="text/javascript">window.onload = function(){if(self==top) { document.getElementById('nameHeader').style.display = "inherit"; } }</script><body><h1 id="nameHeader" style="display: none;">GetHTMLElement</h1><h2>Description: </h2><p>Extracts HTML element values from the incoming flowfile's content using a CSS selector. The incoming HTML is first converted into a HTML Document Object Model so that HTML elements may be selected in the similar manner that CSS selectors are used to apply styles to HTML. The resulting HTML DOM is then "queried" using the user defined CSS selector string. The result of "querying" the HTML DOM may produce 0-N results. If no results are found the flowfile will be transferred to the "element not found" relationship to indicate so to the end user. If N results are found a new flowfile will be created and emitted for each result. The query result will either be placed in the content of the new flowfile or as an attribute of the new flowfile. By default the result is written to an attribute. This can be controlled by the "Destination" property. Resulting query values may also have data prepended or appended to them by setting the value of property "Prepend Element Value" or "Append Element Value". Prepended and appended values are treated as string values and concatenated to the result retrieved from the HTML DOM query operation. A more thorough reference for the CSS selector syntax can be found at "http://jsoup.org/apidocs/org/jsoup/select/Selector.html"</p><h3>Tags: </h3><p>get, html, dom, css, element</p><h3>Properties: </h3><p>In the list below, the names of required properties appear in <strong>bold</strong>. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the <a href="../../../../../html/expression-language-guide.html">NiFi Expression Language</a>.</p><table id="properties"><tr><th>Display Name</th><th>API Name</th><th>Default Value</th><th>Allowable Values</th><th>Description</th></tr><tr><td id="name"><strong>URL</strong></td><td>URL</td><td></td><td id="allowable-values"></td><td id="description">Base URL for the HTML page being parsed. This URL will be used to resolve an absolute URL when an attribute value is extracted from a HTML element.<br/><strong>Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)</strong></td></tr><tr><td id="name"><strong>CSS Selector</strong></td><td>CSS Selector</td><td></td><td id="allowable-values"></td><td id="description">CSS selector syntax string used to extract the desired HTML element(s).<br/><strong>Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)</strong></td></tr><tr><td id="name"><strong>HTML Character Encoding</strong></td><td>HTML Character Encoding</td><td id="default-value">UTF-8</td><td id="allowable-values"></td><td id="description">Character encoding of the input HTML</td></tr><tr><td id="name"><strong>Output Type</strong></td><td>Output Type</td><td id="default-value">HTML</td><td id="allowable-values"><ul><li>HTML</li><li>Text</li><li>Attribute</li><li>Data</li></ul></td><td id="description">Controls the type of DOM value that is retrieved from the HTML element.</td></tr><tr><td id="name"><strong>Destination</strong></td><td>Destination</td><td id="default-value">flowfile-attribute</td><td id="allowable-values"><ul><li>flowfile-attribute</li><li>flowfile-content</li></ul></td><td id="description">Control if element extracted is written as a flowfile attribute or as flowfile content.</td></tr><tr><td id="name">Prepend Element Value</td><td>Prepend Element Value</td><td></td><td id="allowable-values"></td><td id="description">Prepends the specified value to the resulting Element<br/><strong>Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)</strong></td></tr><tr><td id="name">Append Element Value</td><td>Append Element Value</td><td></td><td id="allowable-values"></td><td id="description">Appends the specified value to the resulting Element<br/><strong>Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)</strong></td></tr><tr><td id="name">Attribute Name</td><td>Attribute Name</td><td></td><td id="allowable-values"></td><td id="description">When getting the value of a HTML element attribute this value is used as the key to determine which attribute on the selected element should be retrieved. This value is used when the "Output Type" is set to "Attribute". If this value is prefixed with 'abs:', then the extracted attribute value will be converted into an absolute URL form using the specified base URL.<br/><strong>Supports Expression Language: true (will be evaluated using flow file attributes and variable registry)</strong></td></tr></table><h3>Relationships: </h3><table id="relationships"><tr><th>Name</th><th>Description</th></tr><tr><td>element not found</td><td>Element could not be found in the HTML document. The original HTML input will remain in the FlowFile content unchanged. Relationship 'original' will not be invoked in this scenario.</td></tr><tr><td>success</td><td>Successfully parsed HTML element</td></tr><tr><td>original</td><td>The original HTML input</td></tr><tr><td>invalid html</td><td>The input HTML syntax is invalid</td></tr></table><h3>Reads Attributes: </h3>None specified.<h3>Writes Attributes: </h3><table id="writes-attributes"><tr><th>Name</th><th>Description</th></tr><tr><td>HTMLElement</td><td>Flowfile attribute where the element result parsed from the HTML using the CSS selector syntax are placed if the destination is a flowfile attribute.</td></tr></table><h3>State management: </h3>This component does not store state.<h3>Restricted: </h3>This component is not restricted.<h3>Input requirement: </h3>This component requires an incoming relationship.<h3>System Resource Considerations:</h3>None specified.<h3>See Also:</h3><p><a href="../org.apache.nifi.ModifyHTMLElement/index.html">ModifyHTMLElement</a>, <a href="../org.apache.nifi.PutHTMLElement/index.html">PutHTMLElement</a></p></body></html>