| <!DOCTYPE html> |
| <html lang="en"> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| http://www.apache.org/licenses/LICENSE-2.0 |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| <head> |
| <meta charset="utf-8" /> |
| <title>PutElasticsearchRecord</title> |
| <link rel="stylesheet" href="../../../../../css/component-usage.css" type="text/css" /> |
| </head> |
| <body> |
| <p> |
| This processor is for accessing the Elasticsearch Bulk API. It provides the ability to configure bulk operations on |
| a per-record basis which is what separates it from PutElasticsearchJson. For example, it is possible to define |
| multiple commands to index documents, followed by deletes, creates and update operations against the same index or |
| other indices as desired. |
| </p> |
| <p> |
| As part of the Elasticsearch REST API bundle, it uses a controller service to manage connection information and |
| that controller service is built on top of the official Elasticsearch client APIs. That provides features such as |
| automatic master detection against the cluster which is missing in the other bundles. |
| </p> |
| <p> |
| This processor builds one Elasticsearch Bulk API body per record set. Care should be taken to split up record sets |
| into appropriately-sized chunks so that NiFi does not run out of memory and the requests sent to Elasticsearch are |
| not too large for it to handle. When failures do occur, this processor is capable of attempting to write the records |
| that failed to an output record writer so that only failed records can be processed downstream or replayed. |
| </p> |
| <p> |
| The index, operation and (optional) type fields are configured with default values that can be overridden using |
| record path operations that find an index or type value in the record set. |
| The ID and operation type (create, index, update, upsert or delete) can also be extracted in a similar fashion from |
| the record set. |
| An "@timestamp" field can be added to the data either using a default or by extracting it from the record set. |
| This is useful if the documents are being indexed into an Elasticsearch Data Stream. |
| The following is an example of a document exercising all of these features: |
| </p> |
| <pre> |
| { |
| "metadata": { |
| "id": "12345", |
| "index": "test", |
| "type": "message", |
| "operation": "index" |
| }, |
| "message": "Hello, world", |
| "from": "john.smith", |
| "ts": "2021-12-03'T'14:00:00.000Z" |
| } |
| </pre> |
| <pre> |
| { |
| "metadata": { |
| "id": "12345", |
| "index": "test", |
| "type": "message", |
| "operation": "delete" |
| } |
| } |
| </pre> |
| <p>The record path operations below would extract the relevant data:</p> |
| <ul> |
| <li>/metadata/id</li> |
| <li>/metadata/index</li> |
| <li>metadata/type</li> |
| <li>metadata/operation</li> |
| <li>/ts</li> |
| </ul> |
| <p>Valid values for "operation" are:</p> |
| <ul> |
| <li>create</li> |
| <li>delete</li> |
| <li>index</li> |
| <li>update</li> |
| <li>upsert</li> |
| </ul> |
| </body> |
| </html> |