blob: 1d92a844477e7922fb66c04e3e57e2e52bd0ae00 [file] [log] [blame]
<!doctype html>
<html lang="en" dir="ltr" class="docs-wrapper docs-doc-page docs-version-current plugin-docs plugin-id-default docs-doc-id-development/extensions-core/kafka-extraction-namespace">
<head>
<meta charset="UTF-8">
<meta name="generator" content="Docusaurus v2.4.1">
<title data-rh="true">Apache Kafka Lookups | Apache® Druid</title><meta data-rh="true" name="viewport" content="width=device-width,initial-scale=1"><meta data-rh="true" name="twitter:card" content="summary_large_image"><meta data-rh="true" property="og:image" content="https://druid.apache.org/img/druid_nav.png"><meta data-rh="true" name="twitter:image" content="https://druid.apache.org/img/druid_nav.png"><meta data-rh="true" property="og:url" content="https://druid.apache.org/docs/28.0.0/development/extensions-core/kafka-extraction-namespace"><meta data-rh="true" name="docusaurus_locale" content="en"><meta data-rh="true" name="docsearch:language" content="en"><meta data-rh="true" name="docusaurus_version" content="current"><meta data-rh="true" name="docusaurus_tag" content="docs-default-current"><meta data-rh="true" name="docsearch:version" content="current"><meta data-rh="true" name="docsearch:docusaurus_tag" content="docs-default-current"><meta data-rh="true" property="og:title" content="Apache Kafka Lookups | Apache® Druid"><meta data-rh="true" name="description" content="&lt;!--"><meta data-rh="true" property="og:description" content="&lt;!--"><link data-rh="true" rel="icon" href="/img/favicon.png"><link data-rh="true" rel="canonical" href="https://druid.apache.org/docs/28.0.0/development/extensions-core/kafka-extraction-namespace"><link data-rh="true" rel="alternate" href="https://druid.apache.org/docs/28.0.0/development/extensions-core/kafka-extraction-namespace" hreflang="en"><link data-rh="true" rel="alternate" href="https://druid.apache.org/docs/28.0.0/development/extensions-core/kafka-extraction-namespace" hreflang="x-default"><link rel="preconnect" href="https://www.google-analytics.com">
<link rel="preconnect" href="https://www.googletagmanager.com">
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-131010415-1"></script>
<script>function gtag(){dataLayer.push(arguments)}window.dataLayer=window.dataLayer||[],gtag("js",new Date),gtag("config","UA-131010415-1",{})</script>
<link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.7.2/css/all.css">
<script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/2.0.4/clipboard.min.js"></script><link rel="stylesheet" href="/assets/css/styles.546f39eb.css">
<link rel="preload" href="/assets/js/runtime~main.0dcbfdea.js" as="script">
<link rel="preload" href="/assets/js/main.7f6fdf81.js" as="script">
</head>
<body class="navigation-with-keyboard">
<script>!function(){function t(t){document.documentElement.setAttribute("data-theme",t)}var e=function(){var t=null;try{t=new URLSearchParams(window.location.search).get("docusaurus-theme")}catch(t){}return t}()||function(){var t=null;try{t=localStorage.getItem("theme")}catch(t){}return t}();t(null!==e?e:"light")}()</script><div id="__docusaurus">
<div role="region" aria-label="Skip to main content"><a class="skipToContent_fXgn" href="#__docusaurus_skipToContent_fallback">Skip to main content</a></div><nav aria-label="Main" class="navbar navbar--fixed-top navbar--dark"><div class="navbar__inner"><div class="navbar__items"><button aria-label="Toggle navigation bar" aria-expanded="false" class="navbar__toggle clean-btn" type="button"><svg width="30" height="30" viewBox="0 0 30 30" aria-hidden="true"><path stroke="currentColor" stroke-linecap="round" stroke-miterlimit="10" stroke-width="2" d="M4 7h22M4 15h22M4 23h22"></path></svg></button><a class="navbar__brand" href="/"><div class="navbar__logo"><img src="/img/druid_nav.png" alt="Apache® Druid" class="themedImage_ToTc themedImage--light_HNdA"><img src="/img/druid_nav.png" alt="Apache® Druid" class="themedImage_ToTc themedImage--dark_i4oU"></div></a></div><div class="navbar__items navbar__items--right"><a class="navbar__item navbar__link" href="/technology">Technology</a><a class="navbar__item navbar__link" href="/use-cases">Use Cases</a><a class="navbar__item navbar__link" href="/druid-powered">Powered By</a><a class="navbar__item navbar__link" href="/docs/28.0.0/design/">Docs</a><a class="navbar__item navbar__link" href="/community/">Community</a><div class="navbar__item dropdown dropdown--hoverable dropdown--right"><a href="#" aria-haspopup="true" aria-expanded="false" role="button" class="navbar__link">Apache®</a><ul class="dropdown__menu"><li><a href="https://www.apache.org/" target="_blank" rel="noopener noreferrer" class="dropdown__link">Foundation<svg width="12" height="12" aria-hidden="true" viewBox="0 0 24 24" class="iconExternalLink_nPIU"><path fill="currentColor" d="M21 13v10h-21v-19h12v2h-10v15h17v-8h2zm3-12h-10.988l4.035 4-6.977 7.07 2.828 2.828 6.977-7.07 4.125 4.172v-11z"></path></svg></a></li><li><a href="https://apachecon.com/?ref=druid.apache.org" target="_blank" rel="noopener noreferrer" class="dropdown__link">Events<svg width="12" height="12" aria-hidden="true" viewBox="0 0 24 24" class="iconExternalLink_nPIU"><path fill="currentColor" d="M21 13v10h-21v-19h12v2h-10v15h17v-8h2zm3-12h-10.988l4.035 4-6.977 7.07 2.828 2.828 6.977-7.07 4.125 4.172v-11z"></path></svg></a></li><li><a href="https://www.apache.org/licenses/" target="_blank" rel="noopener noreferrer" class="dropdown__link">License<svg width="12" height="12" aria-hidden="true" viewBox="0 0 24 24" class="iconExternalLink_nPIU"><path fill="currentColor" d="M21 13v10h-21v-19h12v2h-10v15h17v-8h2zm3-12h-10.988l4.035 4-6.977 7.07 2.828 2.828 6.977-7.07 4.125 4.172v-11z"></path></svg></a></li><li><a href="https://www.apache.org/foundation/thanks.html" target="_blank" rel="noopener noreferrer" class="dropdown__link">Thanks<svg width="12" height="12" aria-hidden="true" viewBox="0 0 24 24" class="iconExternalLink_nPIU"><path fill="currentColor" d="M21 13v10h-21v-19h12v2h-10v15h17v-8h2zm3-12h-10.988l4.035 4-6.977 7.07 2.828 2.828 6.977-7.07 4.125 4.172v-11z"></path></svg></a></li><li><a href="https://www.apache.org/security/" target="_blank" rel="noopener noreferrer" class="dropdown__link">Security<svg width="12" height="12" aria-hidden="true" viewBox="0 0 24 24" class="iconExternalLink_nPIU"><path fill="currentColor" d="M21 13v10h-21v-19h12v2h-10v15h17v-8h2zm3-12h-10.988l4.035 4-6.977 7.07 2.828 2.828 6.977-7.07 4.125 4.172v-11z"></path></svg></a></li><li><a href="https://www.apache.org/foundation/sponsorship.html" target="_blank" rel="noopener noreferrer" class="dropdown__link">Sponsorship<svg width="12" height="12" aria-hidden="true" viewBox="0 0 24 24" class="iconExternalLink_nPIU"><path fill="currentColor" d="M21 13v10h-21v-19h12v2h-10v15h17v-8h2zm3-12h-10.988l4.035 4-6.977 7.07 2.828 2.828 6.977-7.07 4.125 4.172v-11z"></path></svg></a></li></ul></div><a class="navbar__item navbar__link" href="/downloads/">Download</a><div class="searchBox_ZlJk"><div class="navbar__search"><span aria-label="expand searchbar" role="button" class="search-icon" tabindex="0"></span><input type="search" id="search_input_react" placeholder="Loading..." aria-label="Search" class="navbar__search-input search-bar" disabled=""></div></div></div></div><div role="presentation" class="navbar-sidebar__backdrop"></div></nav><div id="__docusaurus_skipToContent_fallback" class="main-wrapper mainWrapper_z2l0 docsWrapper_BCFX"><button aria-label="Scroll back to top" class="clean-btn theme-back-to-top-button backToTopButton_sjWU" type="button"></button><div class="docPage__5DB"><main class="docMainContainer_gTbr docMainContainerEnhanced_Uz_u"><div class="container padding-top--md padding-bottom--lg"><div class="row"><div class="col docItemCol_VOVn"><div class="docItemContainer_Djhp"><article><div class="tocCollapsible_ETCw theme-doc-toc-mobile tocMobile_ITEo"><button type="button" class="clean-btn tocCollapsibleButton_TO0P">On this page</button></div><div class="theme-doc-markdown markdown"><header><h1>Apache Kafka Lookups</h1></header><p>To use this Apache Druid extension, <a href="/docs/28.0.0/configuration/extensions#loading-extensions">include</a> <code>druid-lookups-cached-global</code> and <code>druid-kafka-extraction-namespace</code> in the extensions load list.</p><p>If you need updates to populate as promptly as possible, it is possible to plug into a Kafka topic whose key is the old value and message is the desired new value (both in UTF-8) as a LookupExtractorFactory.</p><div class="language-json codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#bfc7d5;--prism-background-color:#292d3e"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-json codeBlock_bY9V thin-scrollbar"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#bfc7d5"><span class="token punctuation" style="color:rgb(199, 146, 234)">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain"> </span><span class="token property">&quot;type&quot;</span><span class="token operator" style="color:rgb(137, 221, 255)">:</span><span class="token string" style="color:rgb(195, 232, 141)">&quot;kafka&quot;</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain"> </span><span class="token property">&quot;kafkaTopic&quot;</span><span class="token operator" style="color:rgb(137, 221, 255)">:</span><span class="token string" style="color:rgb(195, 232, 141)">&quot;testTopic&quot;</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain"> </span><span class="token property">&quot;kafkaProperties&quot;</span><span class="token operator" style="color:rgb(137, 221, 255)">:</span><span class="token punctuation" style="color:rgb(199, 146, 234)">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain"> </span><span class="token property">&quot;bootstrap.servers&quot;</span><span class="token operator" style="color:rgb(137, 221, 255)">:</span><span class="token string" style="color:rgb(195, 232, 141)">&quot;kafka.service:9092&quot;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain"></span><span class="token punctuation" style="color:rgb(199, 146, 234)">}</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div><table><thead><tr><th>Parameter</th><th>Description</th><th>Required</th><th>Default</th></tr></thead><tbody><tr><td><code>kafkaTopic</code></td><td>The Kafka topic to read the data from</td><td>Yes</td><td></td></tr><tr><td><code>kafkaProperties</code></td><td>Kafka consumer properties (<code>bootstrap.servers</code> must be specified)</td><td>Yes</td><td></td></tr><tr><td><code>connectTimeout</code></td><td>How long to wait for an initial connection</td><td>No</td><td><code>0</code> (do not wait)</td></tr><tr><td><code>isOneToOne</code></td><td>The map is a one-to-one (see <a href="/docs/28.0.0/querying/dimensionspecs">Lookup DimensionSpecs</a>)</td><td>No</td><td><code>false</code></td></tr></tbody></table><p>The extension <code>kafka-extraction-namespace</code> enables reading from an <a href="https://kafka.apache.org/" target="_blank" rel="noopener noreferrer">Apache Kafka</a> topic which has name/key pairs to allow renaming of dimension values. An example use case would be to rename an ID to a human-readable format.</p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="how-it-works">How it Works<a href="#how-it-works" class="hash-link" aria-label="Direct link to How it Works" title="Direct link to How it Works"></a></h2><p>The extractor works by consuming the configured Kafka topic from the beginning, and appending every record to an internal map. The key of the Kafka record is used as they key of the map, and the payload of the record is used as the value. At query time, a lookup can be used to transform the key into the associated value. See <a href="/docs/28.0.0/querying/lookups">lookups</a> for how to configure and use lookups in a query. Keys and values are both stored as strings by the lookup extractor.</p><p>The extractor remains subscribed to the topic, so new records are added to the lookup map as they appear. This allows for lookup values to be updated in near-realtime. If two records are added to the topic with the same key, the record with the larger offset will replace the previous record in the lookup map. A record with a <code>null</code> payload will be treated as a tombstone record, and the associated key will be removed from the lookup map.</p><p>The extractor treats the input topic much like a <a href="https://kafka.apache.org/23/javadoc/org/apache/kafka/streams/kstream/KTable.html" target="_blank" rel="noopener noreferrer">KTable</a>. As such, it is best to create your Kafka topic using a <a href="https://kafka.apache.org/documentation/#compaction" target="_blank" rel="noopener noreferrer">log compaction</a> strategy, so that the most-recent version of a key is always preserved in Kafka. Without properly configuring retention and log compaction, older keys that are automatically removed from Kafka will not be available and will be lost when Druid services are restarted.</p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="example">Example<a href="#example" class="hash-link" aria-label="Direct link to Example" title="Direct link to Example"></a></h3><p>Consider a <code>country_codes</code> topic is being consumed, and the following records are added to the topic in the following order:</p><table><thead><tr><th>Offset</th><th>Key</th><th>Payload</th></tr></thead><tbody><tr><td>1</td><td>NZ</td><td>Nu Zeelund</td></tr><tr><td>2</td><td>AU</td><td>Australia</td></tr><tr><td>3</td><td>NZ</td><td>New Zealand</td></tr><tr><td>4</td><td>AU</td><td><code>null</code></td></tr><tr><td>5</td><td>NZ</td><td>Aotearoa</td></tr><tr><td>6</td><td>CZ</td><td>Czechia</td></tr></tbody></table><p>This input topic would be consumed from the beginning, and result in a lookup namespace containing the following mappings (notice that the entry for <em>Australia</em> was added and then deleted):</p><table><thead><tr><th>Key</th><th>Value</th></tr></thead><tbody><tr><td>NZ</td><td>Aotearoa</td></tr><tr><td>CZ</td><td>Czechia</td></tr></tbody></table><p>Now when a query uses this extraction namespace, the country codes can be mapped to the full country name at query time.</p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="tombstones-and-deleting-records">Tombstones and Deleting Records<a href="#tombstones-and-deleting-records" class="hash-link" aria-label="Direct link to Tombstones and Deleting Records" title="Direct link to Tombstones and Deleting Records"></a></h2><p>The Kafka lookup extractor treats <code>null</code> Kafka messages as tombstones. This means that a record on the input topic with a <code>null</code> message payload on Kafka will remove the associated key from the lookup map, effectively deleting it.</p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="limitations">Limitations<a href="#limitations" class="hash-link" aria-label="Direct link to Limitations" title="Direct link to Limitations"></a></h2><p>The consumer properties <code>group.id</code>, <code>auto.offset.reset</code> and <code>enable.auto.commit</code> cannot be set in <code>kafkaProperties</code> as they are set by the extension as <code>UUID.randomUUID().toString()</code>, <code>earliest</code> and <code>false</code> respectively. This is because the entire topic must be consumed by the Druid service from the very beginning so that a complete map of lookup values can be built. Setting any of these consumer properties will cause the extractor to not start.</p><p>Currently, the Kafka lookup extractor feeds the entire Kafka topic into a local cache. If you are using on-heap caching, this can easily clobber your java heap if the Kafka stream spews a lot of unique keys. Off-heap caching should alleviate these concerns, but there is still a limit to the quantity of data that can be stored. There is currently no eviction policy.</p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="testing-the-kafka-rename-functionality">Testing the Kafka rename functionality<a href="#testing-the-kafka-rename-functionality" class="hash-link" aria-label="Direct link to Testing the Kafka rename functionality" title="Direct link to Testing the Kafka rename functionality"></a></h2><p>To test this setup, you can send key/value pairs to a Kafka stream via the following producer console:</p><div class="codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#bfc7d5;--prism-background-color:#292d3e"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#bfc7d5"><span class="token plain">./bin/kafka-console-producer.sh --property parse.key=true --property key.separator=&quot;-&gt;&quot; --broker-list localhost:9092 --topic testTopic</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div><p>Renames can then be published as <code>OLD_VAL-&gt;NEW_VAL</code> followed by newline (enter or return)</p></div></article><nav class="pagination-nav docusaurus-mt-lg" aria-label="Docs pages"></nav></div></div><div class="col col--3"><div class="tableOfContents_bqdL thin-scrollbar theme-doc-toc-desktop"><ul class="table-of-contents table-of-contents__left-border"><li><a href="#how-it-works" class="table-of-contents__link toc-highlight">How it Works</a><ul><li><a href="#example" class="table-of-contents__link toc-highlight">Example</a></li></ul></li><li><a href="#tombstones-and-deleting-records" class="table-of-contents__link toc-highlight">Tombstones and Deleting Records</a></li><li><a href="#limitations" class="table-of-contents__link toc-highlight">Limitations</a></li><li><a href="#testing-the-kafka-rename-functionality" class="table-of-contents__link toc-highlight">Testing the Kafka rename functionality</a></li></ul></div></div></div></div></main></div></div><footer class="footer"><div class="container container-fluid"><div class="footer__bottom text--center"><div class="margin-bottom--sm"><img src="/img/favicon.png" class="themedImage_ToTc themedImage--light_HNdA footer__logo"><img src="/img/favicon.png" class="themedImage_ToTc themedImage--dark_i4oU footer__logo"></div><div class="footer__copyright">Copyright © 2023 Apache Software Foundation. Except where otherwise noted, licensed under CC BY-SA 4.0. Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.</div></div></div></footer></div>
<script src="/assets/js/runtime~main.0dcbfdea.js"></script>
<script src="/assets/js/main.7f6fdf81.js"></script>
</body>
</html>