| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| <meta charset="UTF-8" /> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| <meta name="description" content="Apache Druid"> |
| <meta name="keywords" content="druid,kafka,database,analytics,streaming,real-time,real time,apache,open source"> |
| <meta name="author" content="Apache Software Foundation"> |
| |
| <title>Druid | Tutorial: Updating existing data</title> |
| |
| <link rel="alternate" type="application/atom+xml" href="/feed"> |
| <link rel="shortcut icon" href="/img/favicon.png"> |
| |
| <link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.7.2/css/all.css" integrity="sha384-fnmOCqbTlWIlj8LyTjo7mOUStjsKC4pOpQbqyi7RrhN7udi9RwhKkMHpvLbHG9Sr" crossorigin="anonymous"> |
| |
| <link href='//fonts.googleapis.com/css?family=Open+Sans+Condensed:300,700,300italic|Open+Sans:300italic,400italic,600italic,400,300,600,700' rel='stylesheet' type='text/css'> |
| |
| <link rel="stylesheet" href="/css/bootstrap-pure.css?v=1.1"> |
| <link rel="stylesheet" href="/css/base.css?v=1.1"> |
| <link rel="stylesheet" href="/css/header.css?v=1.1"> |
| <link rel="stylesheet" href="/css/footer.css?v=1.1"> |
| <link rel="stylesheet" href="/css/syntax.css?v=1.1"> |
| <link rel="stylesheet" href="/css/docs.css?v=1.1"> |
| |
| <script> |
| (function() { |
| var cx = '000162378814775985090:molvbm0vggm'; |
| var gcse = document.createElement('script'); |
| gcse.type = 'text/javascript'; |
| gcse.async = true; |
| gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') + |
| '//cse.google.com/cse.js?cx=' + cx; |
| var s = document.getElementsByTagName('script')[0]; |
| s.parentNode.insertBefore(gcse, s); |
| })(); |
| </script> |
| |
| |
| </head> |
| |
| <body> |
| <!-- Start page_header include --> |
| <script src="//ajax.googleapis.com/ajax/libs/jquery/2.2.4/jquery.min.js"></script> |
| |
| <div class="top-navigator"> |
| <div class="container"> |
| <div class="left-cont"> |
| <a class="logo" href="/"><span class="druid-logo"></span></a> |
| </div> |
| <div class="right-cont"> |
| <ul class="links"> |
| <li class=""><a href="/technology">Technology</a></li> |
| <li class=""><a href="/use-cases">Use Cases</a></li> |
| <li class=""><a href="/druid-powered">Powered By</a></li> |
| <li class=""><a href="/docs/latest/design/">Docs</a></li> |
| <li class=""><a href="/community/">Community</a></li> |
| <li class="header-dropdown"> |
| <a>Apache</a> |
| <div class="header-dropdown-menu"> |
| <a href="https://www.apache.org/" target="_blank">Foundation</a> |
| <a href="https://www.apache.org/events/current-event" target="_blank">Events</a> |
| <a href="https://www.apache.org/licenses/" target="_blank">License</a> |
| <a href="https://www.apache.org/foundation/thanks.html" target="_blank">Thanks</a> |
| <a href="https://www.apache.org/security/" target="_blank">Security</a> |
| <a href="https://www.apache.org/foundation/sponsorship.html" target="_blank">Sponsorship</a> |
| </div> |
| </li> |
| <li class=" button-link"><a href="/downloads.html">Download</a></li> |
| </ul> |
| </div> |
| </div> |
| <div class="action-button menu-icon"> |
| <span class="fa fa-bars"></span> MENU |
| </div> |
| <div class="action-button menu-icon-close"> |
| <span class="fa fa-times"></span> MENU |
| </div> |
| </div> |
| |
| <script type="text/javascript"> |
| var $menu = $('.right-cont'); |
| var $menuIcon = $('.menu-icon'); |
| var $menuIconClose = $('.menu-icon-close'); |
| |
| function showMenu() { |
| $menu.fadeIn(100); |
| $menuIcon.fadeOut(100); |
| $menuIconClose.fadeIn(100); |
| } |
| |
| $menuIcon.click(showMenu); |
| |
| function hideMenu() { |
| $menu.fadeOut(100); |
| $menuIconClose.fadeOut(100); |
| $menuIcon.fadeIn(100); |
| } |
| |
| $menuIconClose.click(hideMenu); |
| |
| $(window).resize(function() { |
| if ($(window).width() >= 840) { |
| $menu.fadeIn(100); |
| $menuIcon.fadeOut(100); |
| $menuIconClose.fadeOut(100); |
| } |
| else { |
| $menu.fadeOut(100); |
| $menuIcon.fadeIn(100); |
| $menuIconClose.fadeOut(100); |
| } |
| }); |
| </script> |
| |
| <!-- Stop page_header include --> |
| |
| |
| <div class="container doc-container"> |
| |
| |
| |
| |
| <p> Looking for the <a href="/docs/0.16.0-incubating/">latest stable documentation</a>?</p> |
| |
| |
| <div class="row"> |
| <div class="col-md-9 doc-content"> |
| <p> |
| <a class="btn btn-default btn-xs visible-xs-inline-block visible-sm-inline-block" href="#toc">Table of Contents</a> |
| </p> |
| <!-- |
| ~ Licensed to the Apache Software Foundation (ASF) under one |
| ~ or more contributor license agreements. See the NOTICE file |
| ~ distributed with this work for additional information |
| ~ regarding copyright ownership. The ASF licenses this file |
| ~ to you under the Apache License, Version 2.0 (the |
| ~ "License"); you may not use this file except in compliance |
| ~ with the License. You may obtain a copy of the License at |
| ~ |
| ~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~ |
| ~ Unless required by applicable law or agreed to in writing, |
| ~ software distributed under the License is distributed on an |
| ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| ~ KIND, either express or implied. See the License for the |
| ~ specific language governing permissions and limitations |
| ~ under the License. |
| --> |
| |
| <h1 id="tutorial-updating-existing-data">Tutorial: Updating existing data</h1> |
| |
| <p>This tutorial demonstrates how to update existing data, showing both overwrites and appends.</p> |
| |
| <p>For this tutorial, we'll assume you've already downloaded Apache Druid (incubating) as described in |
| the <a href="index.html">single-machine quickstart</a> and have it running on your local machine. </p> |
| |
| <p>It will also be helpful to have finished <a href="../tutorials/tutorial-batch.html">Tutorial: Loading a file</a>, <a href="../tutorials/tutorial-query.html">Tutorial: Querying data</a>, and <a href="../tutorials/tutorial-rollup.html">Tutorial: Rollup</a>.</p> |
| |
| <h2 id="overwrite">Overwrite</h2> |
| |
| <p>This section of the tutorial will cover how to overwrite an existing interval of data.</p> |
| |
| <h3 id="load-initial-data">Load initial data</h3> |
| |
| <p>Let's load an initial data set which we will overwrite and append to.</p> |
| |
| <p>The spec we'll use for this tutorial is located at <code>quickstart/tutorial/updates-init-index.json</code>. This spec creates a datasource called <code>updates-tutorial</code> from the <code>quickstart/tutorial/updates-data.json</code> input file.</p> |
| |
| <p>Let's submit that task:</p> |
| <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>bin/post-index-task --file quickstart/tutorial/updates-init-index.json |
| </code></pre></div> |
| <p>We have three initial rows containing an "animal" dimension and "number" metric:</p> |
| <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>dsql> <span class="k">select</span> * from <span class="s2">"updates-tutorial"</span><span class="p">;</span> |
| ┌──────────────────────────┬──────────┬───────┬────────┐ |
| │ __time │ animal │ count │ number │ |
| ├──────────────────────────┼──────────┼───────┼────────┤ |
| │ <span class="m">2018</span>-01-01T01:01:00.000Z │ tiger │ <span class="m">1</span> │ <span class="m">100</span> │ |
| │ <span class="m">2018</span>-01-01T03:01:00.000Z │ aardvark │ <span class="m">1</span> │ <span class="m">42</span> │ |
| │ <span class="m">2018</span>-01-01T03:01:00.000Z │ giraffe │ <span class="m">1</span> │ <span class="m">14124</span> │ |
| └──────────────────────────┴──────────┴───────┴────────┘ |
| Retrieved <span class="m">3</span> rows in <span class="m">1</span>.42s. |
| </code></pre></div> |
| <h3 id="overwrite-the-initial-data">Overwrite the initial data</h3> |
| |
| <p>To overwrite this data, we can submit another task for the same interval, but with different input data.</p> |
| |
| <p>The <code>quickstart/tutorial/updates-overwrite-index.json</code> spec will perform an overwrite on the <code>updates-tutorial</code> datasource.</p> |
| |
| <p>Note that this task reads input from <code>quickstart/tutorial/updates-data2.json</code>, and <code>appendToExisting</code> is set to <code>false</code> (indicating this is an overwrite).</p> |
| |
| <p>Let's submit that task:</p> |
| <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>bin/post-index-task --file quickstart/tutorial/updates-overwrite-index.json |
| </code></pre></div> |
| <p>When Druid finishes loading the new segment from this overwrite task, the "tiger" row now has the value "lion", the "aardvark" row has a different number, and the "giraffe" row has been replaced. It may take a couple of minutes for the changes to take effect:</p> |
| <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>dsql> <span class="k">select</span> * from <span class="s2">"updates-tutorial"</span><span class="p">;</span> |
| ┌──────────────────────────┬──────────┬───────┬────────┐ |
| │ __time │ animal │ count │ number │ |
| ├──────────────────────────┼──────────┼───────┼────────┤ |
| │ <span class="m">2018</span>-01-01T01:01:00.000Z │ lion │ <span class="m">1</span> │ <span class="m">100</span> │ |
| │ <span class="m">2018</span>-01-01T03:01:00.000Z │ aardvark │ <span class="m">1</span> │ <span class="m">9999</span> │ |
| │ <span class="m">2018</span>-01-01T04:01:00.000Z │ bear │ <span class="m">1</span> │ <span class="m">111</span> │ |
| └──────────────────────────┴──────────┴───────┴────────┘ |
| Retrieved <span class="m">3</span> rows in <span class="m">0</span>.02s. |
| </code></pre></div> |
| <h2 id="combine-old-data-with-new-data-and-overwrite">Combine old data with new data and overwrite</h2> |
| |
| <p>Let's try appending some new data to the <code>updates-tutorial</code> datasource now. We will add the data from <code>quickstart/tutorial/updates-data3.json</code>.</p> |
| |
| <p>The <code>quickstart/tutorial/updates-append-index.json</code> task spec has been configured to read from the existing <code>updates-tutorial</code> datasource and the <code>quickstart/tutorial/updates-data3.json</code> file. The task will combine data from the two input sources, and then overwrite the original data with the new combined data.</p> |
| |
| <p>Let's submit that task:</p> |
| <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>bin/post-index-task --file quickstart/tutorial/updates-append-index.json |
| </code></pre></div> |
| <p>When Druid finishes loading the new segment from this overwrite task, the new rows will have been added to the datasource. Note that roll-up occurred for the "lion" row:</p> |
| <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>dsql> <span class="k">select</span> * from <span class="s2">"updates-tutorial"</span><span class="p">;</span> |
| ┌──────────────────────────┬──────────┬───────┬────────┐ |
| │ __time │ animal │ count │ number │ |
| ├──────────────────────────┼──────────┼───────┼────────┤ |
| │ <span class="m">2018</span>-01-01T01:01:00.000Z │ lion │ <span class="m">2</span> │ <span class="m">400</span> │ |
| │ <span class="m">2018</span>-01-01T03:01:00.000Z │ aardvark │ <span class="m">1</span> │ <span class="m">9999</span> │ |
| │ <span class="m">2018</span>-01-01T04:01:00.000Z │ bear │ <span class="m">1</span> │ <span class="m">111</span> │ |
| │ <span class="m">2018</span>-01-01T05:01:00.000Z │ mongoose │ <span class="m">1</span> │ <span class="m">737</span> │ |
| │ <span class="m">2018</span>-01-01T06:01:00.000Z │ snake │ <span class="m">1</span> │ <span class="m">1234</span> │ |
| │ <span class="m">2018</span>-01-01T07:01:00.000Z │ octopus │ <span class="m">1</span> │ <span class="m">115</span> │ |
| └──────────────────────────┴──────────┴───────┴────────┘ |
| Retrieved <span class="m">6</span> rows in <span class="m">0</span>.02s. |
| </code></pre></div> |
| <h2 id="append-to-the-data">Append to the data</h2> |
| |
| <p>Let's try another way of appending data.</p> |
| |
| <p>The <code>quickstart/tutorial/updates-append-index2.json</code> task spec reads input from <code>quickstart/tutorial/updates-data4.json</code> and will append its data to the <code>updates-tutorial</code> datasource. Note that <code>appendToExisting</code> is set to <code>true</code> in this spec.</p> |
| |
| <p>Let's submit that task:</p> |
| <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>bin/post-index-task --file quickstart/tutorial/updates-append-index2.json |
| </code></pre></div> |
| <p>When the new data is loaded, we can see two additional rows after "octopus". Note that the new "bear" row with number 222 has not been rolled up with the existing bear-111 row, because the new data is held in a separate segment.</p> |
| <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>dsql> <span class="k">select</span> * from <span class="s2">"updates-tutorial"</span><span class="p">;</span> |
| ┌──────────────────────────┬──────────┬───────┬────────┐ |
| │ __time │ animal │ count │ number │ |
| ├──────────────────────────┼──────────┼───────┼────────┤ |
| │ <span class="m">2018</span>-01-01T01:01:00.000Z │ lion │ <span class="m">2</span> │ <span class="m">400</span> │ |
| │ <span class="m">2018</span>-01-01T03:01:00.000Z │ aardvark │ <span class="m">1</span> │ <span class="m">9999</span> │ |
| │ <span class="m">2018</span>-01-01T04:01:00.000Z │ bear │ <span class="m">1</span> │ <span class="m">111</span> │ |
| │ <span class="m">2018</span>-01-01T05:01:00.000Z │ mongoose │ <span class="m">1</span> │ <span class="m">737</span> │ |
| │ <span class="m">2018</span>-01-01T06:01:00.000Z │ snake │ <span class="m">1</span> │ <span class="m">1234</span> │ |
| │ <span class="m">2018</span>-01-01T07:01:00.000Z │ octopus │ <span class="m">1</span> │ <span class="m">115</span> │ |
| │ <span class="m">2018</span>-01-01T04:01:00.000Z │ bear │ <span class="m">1</span> │ <span class="m">222</span> │ |
| │ <span class="m">2018</span>-01-01T09:01:00.000Z │ falcon │ <span class="m">1</span> │ <span class="m">1241</span> │ |
| └──────────────────────────┴──────────┴───────┴────────┘ |
| Retrieved <span class="m">8</span> rows in <span class="m">0</span>.02s. |
| </code></pre></div> |
| <p>If we run a GroupBy query instead of a <code>select *</code>, we can see that the "bear" rows will group together at query time:</p> |
| <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>dsql> <span class="k">select</span> __time, animal, SUM<span class="o">(</span><span class="s2">"count"</span><span class="o">)</span>, SUM<span class="o">(</span><span class="s2">"number"</span><span class="o">)</span> from <span class="s2">"updates-tutorial"</span> group by __time, animal<span class="p">;</span> |
| ┌──────────────────────────┬──────────┬────────┬────────┐ |
| │ __time │ animal │ EXPR<span class="nv">$2</span> │ EXPR<span class="nv">$3</span> │ |
| ├──────────────────────────┼──────────┼────────┼────────┤ |
| │ <span class="m">2018</span>-01-01T01:01:00.000Z │ lion │ <span class="m">2</span> │ <span class="m">400</span> │ |
| │ <span class="m">2018</span>-01-01T03:01:00.000Z │ aardvark │ <span class="m">1</span> │ <span class="m">9999</span> │ |
| │ <span class="m">2018</span>-01-01T04:01:00.000Z │ bear │ <span class="m">2</span> │ <span class="m">333</span> │ |
| │ <span class="m">2018</span>-01-01T05:01:00.000Z │ mongoose │ <span class="m">1</span> │ <span class="m">737</span> │ |
| │ <span class="m">2018</span>-01-01T06:01:00.000Z │ snake │ <span class="m">1</span> │ <span class="m">1234</span> │ |
| │ <span class="m">2018</span>-01-01T07:01:00.000Z │ octopus │ <span class="m">1</span> │ <span class="m">115</span> │ |
| │ <span class="m">2018</span>-01-01T09:01:00.000Z │ falcon │ <span class="m">1</span> │ <span class="m">1241</span> │ |
| └──────────────────────────┴──────────┴────────┴────────┘ |
| Retrieved <span class="m">7</span> rows in <span class="m">0</span>.23s. |
| </code></pre></div> |
| </div> |
| <div class="col-md-3"> |
| <div class="searchbox"> |
| <gcse:searchbox-only></gcse:searchbox-only> |
| </div> |
| <div id="toc" class="nav toc hidden-print"> |
| </div> |
| </div> |
| </div> |
| </div> |
| |
| <!-- Start page_footer include --> |
| <footer class="druid-footer"> |
| <div class="container"> |
| <div class="text-center"> |
| <p> |
| <a href="/technology">Technology</a> ·  |
| <a href="/use-cases">Use Cases</a> ·  |
| <a href="/druid-powered">Powered by Druid</a> ·  |
| <a href="/docs/latest">Docs</a> ·  |
| <a href="/community/">Community</a> ·  |
| <a href="/downloads.html">Download</a> ·  |
| <a href="/faq">FAQ</a> |
| </p> |
| </div> |
| <div class="text-center"> |
| <a title="Join the user group" href="https://groups.google.com/forum/#!forum/druid-user" target="_blank"><span class="fa fa-comments"></span></a> ·  |
| <a title="Follow Druid" href="https://twitter.com/druidio" target="_blank"><span class="fab fa-twitter"></span></a> ·  |
| <a title="Download via Apache" href="https://www.apache.org/dyn/closer.cgi?path=/incubator/druid/0.16.0-incubating/apache-druid-0.16.0-incubating-bin.tar.gz" target="_blank"><span class="fas fa-feather"></span></a> ·  |
| <a title="GitHub" href="https://github.com/apache/incubator-druid" target="_blank"><span class="fab fa-github"></span></a> |
| </div> |
| <div class="text-center license"> |
| Copyright © 2019 <a href="https://www.apache.org/" target="_blank">Apache Software Foundation</a>.<br> |
| Except where otherwise noted, licensed under <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA 4.0</a>.<br> |
| Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries. |
| </div> |
| </div> |
| </footer> |
| |
| <script async src="https://www.googletagmanager.com/gtag/js?id=UA-131010415-1"></script> |
| <script> |
| window.dataLayer = window.dataLayer || []; |
| function gtag(){dataLayer.push(arguments);} |
| gtag('js', new Date()); |
| gtag('config', 'UA-131010415-1'); |
| </script> |
| <script> |
| function trackDownload(type, url) { |
| ga('send', 'event', 'download', type, url); |
| } |
| </script> |
| <script src="//code.jquery.com/jquery.min.js"></script> |
| <script src="//maxcdn.bootstrapcdn.com/bootstrap/3.2.0/js/bootstrap.min.js"></script> |
| <script src="/assets/js/druid.js"></script> |
| <!-- stop page_footer include --> |
| |
| |
| <script> |
| $(function() { |
| $(".toc").load("/docs/0.14.1-incubating/toc.html"); |
| |
| // There is no way to tell when .gsc-input will be async loaded into the page so just try to set a placeholder until it works |
| var tries = 0; |
| var timer = setInterval(function() { |
| tries++; |
| if (tries > 300) clearInterval(timer); |
| var searchInput = $('input.gsc-input'); |
| if (searchInput.length) { |
| searchInput.attr('placeholder', 'Search'); |
| clearInterval(timer); |
| } |
| }, 100); |
| }); |
| </script> |
| </body> |
| </html> |