| <div><a href="#main" class="skipToContent_OuoZ shadow--md">Skip to main content</a></div><nav class="navbar navbar--fixed-top navbarHideable_RReh"><div class="navbar__inner"><div class="navbar__items"><button aria-label="Navigation bar toggle" class="navbar__toggle clean-btn" type="button" tabindex="0"><svg width="30" height="30" viewBox="0 0 30 30" aria-hidden="true"><path stroke="currentColor" stroke-linecap="round" stroke-miterlimit="10" stroke-width="2" d="M4 7h22M4 15h22M4 23h22"></path></svg></button><a class="navbar__brand" href="/"><img src="/img/pinot-navbar-logo-722f37.svg" alt="Pinot" class="themedImage_TMUO themedImage--light_4Vu1 navbar__logo"><img src="/img/pinot-navbar-logo-722f37.svg" alt="Pinot" class="themedImage_TMUO themedImage--dark_uzRr navbar__logo"></a></div><div class="navbar__items navbar__items--right"><a href="https://docs.pinot.apache.org/" target="_blank" rel="noopener noreferrer" class="navbar__item navbar__link">Docs</a><a class="navbar__item navbar__link" href="/download">Download</a><a aria-current="page" class="navbar__item navbar__link navbar__link--active" href="/blog">Blog</a><a href="https://github.com/apache/pinot" target="_blank" rel="noopener noreferrer" class="navbar__item navbar__link"><span>GitHub<svg width="13.5" height="13.5" aria-hidden="true" viewBox="0 0 24 24" class="iconExternalLink_wgqa"><path fill="currentColor" d="M21 13v10h-21v-19h12v2h-10v15h17v-8h2zm3-12h-10.988l4.035 4-6.977 7.07 2.828 2.828 6.977-7.07 4.125 4.172v-11z"></path></svg></span></a><div class="react-toggle displayOnlyInLargeViewport_cxYs react-toggle--checked react-toggle--disabled"><div class="react-toggle-track" role="button" tabindex="-1"><div class="react-toggle-track-check"><span class="toggle_iYfV" style="margin-left:2px">🌙</span></div><div class="react-toggle-track-x"><span class="toggle_iYfV" style="margin-left:2px">☀️</span></div><div class="react-toggle-thumb"></div></div><input type="checkbox" checked="" class="react-toggle-screenreader-only" aria-label="Switch between dark and light mode"></div><div class="searchBox_NKBi"><button type="button" class="DocSearch DocSearch-Button" aria-label="Search"><span class="DocSearch-Button-Container"><svg width="20" height="20" class="DocSearch-Search-Icon" viewBox="0 0 20 20"><path d="M14.386 14.386l4.0877 4.0877-4.0877-4.0877c-2.9418 2.9419-7.7115 2.9419-10.6533 0-2.9419-2.9418-2.9419-7.7115 0-10.6533 2.9418-2.9419 7.7115-2.9419 10.6533 0 2.9419 2.9418 2.9419 7.7115 0 10.6533z" stroke="currentColor" fill="none" fill-rule="evenodd" stroke-linecap="round" stroke-linejoin="round"></path></svg><span class="DocSearch-Button-Placeholder">Search</span></span><span class="DocSearch-Button-Keys"></span></button></div></div></div><div role="presentation" class="navbar-sidebar__backdrop"></div><div class="navbar-sidebar"><div class="navbar-sidebar__brand"><a class="navbar__brand" href="/"><img src="/img/pinot-navbar-logo-722f37.svg" alt="Pinot" class="themedImage_TMUO themedImage--light_4Vu1 navbar__logo"><img src="/img/pinot-navbar-logo-722f37.svg" alt="Pinot" class="themedImage_TMUO themedImage--dark_uzRr navbar__logo"></a></div><div class="navbar-sidebar__items"><div class="menu"><ul class="menu__list"><li class="menu__list-item"><a href="https://docs.pinot.apache.org/" target="_blank" rel="noopener noreferrer" class="menu__link">Docs</a></li><li class="menu__list-item"><a class="menu__link" href="/download">Download</a></li><li class="menu__list-item"><a aria-current="page" class="menu__link navbar__link--active" href="/blog">Blog</a></li><li class="menu__list-item"><a href="https://github.com/apache/pinot" target="_blank" rel="noopener noreferrer" class="menu__link"><span>GitHub<svg width="13.5" height="13.5" aria-hidden="true" viewBox="0 0 24 24" class="iconExternalLink_wgqa"><path fill="currentColor" d="M21 13v10h-21v-19h12v2h-10v15h17v-8h2zm3-12h-10.988l4.035 4-6.977 7.07 2.828 2.828 6.977-7.07 4.125 4.172v-11z"></path></svg></span></a></li></ul></div></div></div></nav><div class="main-wrapper blog-wrapper blog-post-page"><div class="container margin-vert--lg"><div class="row"><aside class="col col--3"><nav class="sidebar_q+wC thin-scrollbar" aria-label="Blog recent posts navigation"><div class="sidebarItemTitle_9G5K margin-bottom--md">All our posts</div><ul class="sidebarItemList_6T4b"><li class="sidebarItem_cjdF"><a class="sidebarItemLink_zyXk" href="/blog/2023/09/19/Annoucing-Apache-Pinot-1-0">Announcing Apache Pinot 1.0™</a></li><li class="sidebarItem_cjdF"><a class="sidebarItemLink_zyXk" href="/blog/2023/08/04/segment-compaction-for-upsert-enabled-tables-in-apache-pinot-3f30657aa077">Segment Compaction for Upsert Enabled Tables in Apache Pinot</a></li><li class="sidebarItem_cjdF"><a class="sidebarItemLink_zyXk" href="/blog/2023/07/12/star-tree-index-in-apache-pinot-part-3-understanding-the-impact-in-real-customer">Star-Tree Index in Apache Pinot - Part 3 - Understanding the Impact in Real Customer Scenarios</a></li><li class="sidebarItem_cjdF"><a class="sidebarItemLink_zyXk" href="/blog/2023/06/01/real-time-mastodon-usage-with-apache-kafka-apache-pinot-and-streamlit">Real-Time Mastodon Usage with Apache Kafka, Apache Pinot, and Streamlit</a></li><li class="sidebarItem_cjdF"><a class="sidebarItemLink_zyXk" href="/blog/2023/05/30/how-to-ingest-streaming-data-from-kafka-to-apache-pinot">How to Ingest Streaming Data from Kafka to Apache Pinot™</a></li><li class="sidebarItem_cjdF"><a class="sidebarItemLink_zyXk" href="/blog/2023/05/23/change-data-capture-with-apache-pinot-how-does-it-work">Change Data Capture with Apache Pinot - How Does It Work?</a></li><li class="sidebarItem_cjdF"><a class="sidebarItemLink_zyXk" href="/blog/2023/05/18/apache-pinot-tutorial-for-getting-started-a-step-by-step-guide">Apache Pinot Tutorial for Getting Started - A Step-by-Step Guide</a></li><li class="sidebarItem_cjdF"><a class="sidebarItemLink_zyXk" href="/blog/2023/05/16/star-tree-indexes-in-apache-pinot-part-1-understanding-the-impact-on-query-performance">StarTree Indexes in Apache Pinot Part-1 - Understanding the Impact on Query Performance</a></li><li class="sidebarItem_cjdF"><a class="sidebarItemLink_zyXk" href="/blog/2023/05/11/Geospatial-Indexing-in-Apache-Pinot">Geospatial Indexing in Apache Pinot</a></li><li class="sidebarItem_cjdF"><a class="sidebarItemLink_zyXk" href="/blog/2023/03/30/Apache-Pinot-0-12-Consumer-Record-Lag">Apache Pinot™ 0.12 - Consumer Record Lag</a></li></ul></nav></aside><main class="col col--7"><article><header><h1 class="blogPostTitle_d4p0">Apache Pinot™ 0.11 - Inserts from SQL</h1><div class="blogPostData_-Im+ margin-vert--md"><time datetime="2022-11-17T00:00:00.000Z">November 17, 2022</time> · 4 min read</div><div class="avatar margin-vert--md"><a href="https://www.linkedin.com/in/markhneedham/" target="_blank" rel="noopener noreferrer" class="avatar__photo-link avatar__photo"><img src="https://www.datocms-assets.com/75153/1661544338-mark-needham.png" alt="Mark Needham"></a><div class="avatar__intro"><div class="avatar__name"><a href="https://www.linkedin.com/in/markhneedham/" target="_blank" rel="noopener noreferrer">Mark Needham</a></div><small class="avatar__subtitle">Mark Needham</small></div></div></header><div class="markdown"><p>The Apache Pinot community recently released version <a href="https://medium.com/apache-pinot-developer-blog/apache-pinot-0-11-released-d564684df5d4" target="_blank" rel="noopener noreferrer">0.11.0</a>, which has lots of goodies for you to play with. This is the second in a series of blog posts showing off some of the new features in this release.</p><p>In this post, we’re going to explore the <a href="https://docs.pinot.apache.org/basics/data-import/from-query-console" target="_blank" rel="noopener noreferrer">INSERT INTO clause</a>, which makes ingesting batch data into Pinot as easy as writing a SQL query.</p><h2><a aria-hidden="true" tabindex="-1" class="anchor" id="batch-importing-the-job-specification"></a>Batch importing: The Job Specification<a class="hash-link" href="#batch-importing-the-job-specification" title="Direct link to heading">#</a></h2><p>The power of this new clause is only fully appreciated if we look at what we had to do before it existed. </p><p>In the <a href="https://www.youtube.com/watch?v=1EMBx1XeI9o" target="_blank" rel="noopener noreferrer">Batch Import JSON from Amazon S3 into Apache Pinot | StarTree Recipes</a> video (and <a href="https://dev.startree.ai/docs/pinot/recipes/ingest-csv-files-from-s3" target="_blank" rel="noopener noreferrer">accompanying developer guide</a>), we showed how to ingest data into Pinot from an S3 bucket.</p><p>The contents of that bucket are shown in the screenshot below:</p><p><img src="https://www.datocms-assets.com/75153/1668701275-image4.png" alt="Sample data ingested into Apache Pinot from a S3 bucket" title="Sample data ingested into Apache Pinot from a S3 bucket"></p><p>Let’s quickly recap the steps that we had to do to import those files into Pinot. We have a table called events, which has the following schema:</p><p><img src="https://www.datocms-assets.com/75153/1668701353-image1.png" alt="Events schema table" title="Events schema table"></p><p>We first created a job specification file, which contains a description of our import job. The job file is shown below:</p><div class="codeBlockContainer_J+bg"><div class="codeBlockContent_csEI yaml"><pre tabindex="0" class="prism-code language-yaml codeBlock_rtdJ thin-scrollbar" style="color:#F8F8F2;background-color:#282A36"><code class="codeBlockLines_1zSZ"><span class="token-line" style="color:#F8F8F2"><span class="token key atrule">executionFrameworkSpec</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"> </span><span class="token key atrule">name</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">'standalone'</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"> </span><span class="token key atrule">segmentGenerationJobRunnerClassName</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"> </span><span class="token key atrule">segmentTarPushJobRunnerClassName</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"> </span><span class="token key atrule">segmentUriPushJobRunnerClassName</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token key atrule">jobType</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> SegmentCreationAndTarPush</span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token key atrule">inputDirURI</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">'s3://marks-st-cloud-bucket/events/'</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token key atrule">includeFileNamePattern</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">'glob:**/*.json'</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token key atrule">outputDirURI</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">'/data'</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token key atrule">overwriteOutput</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> </span><span class="token boolean important">true</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token key atrule">pinotFSSpecs</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"> </span><span class="token punctuation" style="color:rgb(248, 248, 242)">-</span><span class="token plain"> </span><span class="token key atrule">scheme</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> s3</span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"> </span><span class="token key atrule">className</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> org.apache.pinot.plugin.filesystem.S3PinotFS</span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"> </span><span class="token key atrule">configs</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"> </span><span class="token key atrule">region</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">'eu-west-2'</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"> </span><span class="token punctuation" style="color:rgb(248, 248, 242)">-</span><span class="token plain"> </span><span class="token key atrule">scheme</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> file</span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"> </span><span class="token key atrule">className</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> org.apache.pinot.spi.filesystem.LocalPinotFS</span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token key atrule">recordReaderSpec</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"> </span><span class="token key atrule">dataFormat</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">'json'</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"> </span><span class="token key atrule">className</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">'org.apache.pinot.plugin.inputformat.json.JSONRecordReader'</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token key atrule">tableSpec</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"> </span><span class="token key atrule">tableName</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">'events'</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token key atrule">pinotClusterSpecs</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"> </span><span class="token punctuation" style="color:rgb(248, 248, 242)">-</span><span class="token plain"> </span><span class="token key atrule">controllerURI</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">'http://${PINOT_CONTROLLER}:9000'</span></span></code></pre><button type="button" aria-label="Copy code to clipboard" class="copyButton_M3SB clean-btn">Copy</button></div></div><p>At a high level, this file describes a batch import job that will ingest files from the S3 bucket at s3://marks-st-cloud-bucket/events/ where the files match the glob:**/*.json pattern.</p><p>We can import the data by running the following command from the terminal:</p><div class="codeBlockContainer_J+bg"><div class="codeBlockContent_csEI bash"><pre tabindex="0" class="prism-code language-bash codeBlock_rtdJ thin-scrollbar" style="color:#F8F8F2;background-color:#282A36"><code class="codeBlockLines_1zSZ"><span class="token-line" style="color:#F8F8F2"><span class="token function" style="color:rgb(80, 250, 123)">docker</span><span class="token plain"> run </span><span class="token punctuation" style="color:rgb(248, 248, 242)">\</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"> --network ingest-json-files-s3 </span><span class="token punctuation" style="color:rgb(248, 248, 242)">\</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"> -v </span><span class="token environment constant" style="color:rgb(189, 147, 249)">$PWD</span><span class="token plain">/config:/config </span><span class="token punctuation" style="color:rgb(248, 248, 242)">\</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"> -e </span><span class="token assign-left variable" style="color:rgb(189, 147, 249);font-style:italic">AWS_ACCESS_KEY_ID</span><span class="token operator">=</span><span class="token plain">AKIARCOCT6DWLUB7F77Z </span><span class="token punctuation" style="color:rgb(248, 248, 242)">\</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"> -e </span><span class="token assign-left variable" style="color:rgb(189, 147, 249);font-style:italic">AWS_SECRET_ACCESS_KEY</span><span class="token operator">=</span><span class="token plain">gfz71RX+Tj4udve43YePCBqMsIeN1PvHXrVFyxJS </span><span class="token punctuation" style="color:rgb(248, 248, 242)">\</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"> apachepinot/pinot:0.11.0 LaunchDataIngestionJob </span><span class="token punctuation" style="color:rgb(248, 248, 242)">\</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"> -jobSpecFile /config/job-spec.yml </span><span class="token punctuation" style="color:rgb(248, 248, 242)">\</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"> -values </span><span class="token assign-left variable" style="color:rgb(189, 147, 249);font-style:italic">PINOT_CONTROLLER</span><span class="token operator">=</span><span class="token plain">pinot-controller</span></span></code></pre><button type="button" aria-label="Copy code to clipboard" class="copyButton_M3SB clean-btn">Copy</button></div></div><p>And don’t worry, those credentials have already been deleted; I find it easier to understand what values go where if we use real values. </p><p>Once we’ve run this command, if we go to the Pinot UI at <a href="http://localhost:9000/" target="_blank" rel="noopener noreferrer">http://localhost:9000</a> and click through to the events table from the Query Console menu, we’ll see that the records have been imported, as shown in the screenshot below:</p><p><img src="https://www.datocms-assets.com/75153/1668701512-image3.png" alt="Sample imported records shown in the Apache Pinot Query Console menu" title="Sample imported records shown in the Apache Pinot Query Console menu"></p><p>This approach works, and we may still prefer to use it when we need fine-grained control over the ingestion parameters, but it is a bit heavyweight for your everyday data import!</p><h2><a aria-hidden="true" tabindex="-1" class="anchor" id="batch-importing-with-sql"></a>Batch Importing with SQL<a class="hash-link" href="#batch-importing-with-sql" title="Direct link to heading">#</a></h2><p>Now let’s do the same thing in SQL.</p><p>There are some prerequisites to using the SQL approach, so let’s go through those now, so you don’t end up with a bunch of exceptions when you try this out! </p><p>First of all, you must have a <a href="https://docs.pinot.apache.org/basics/components/minion" target="_blank" rel="noopener noreferrer">Minion</a> in the Pinot cluster, as this is the component that will do the data import.</p><p>You’ll also need to include the following in your table config:</p><div class="codeBlockContainer_J+bg"><div class="codeBlockContent_csEI json"><pre tabindex="0" class="prism-code language-json codeBlock_rtdJ thin-scrollbar" style="color:#F8F8F2;background-color:#282A36"><code class="codeBlockLines_1zSZ"><span class="token-line" style="color:#F8F8F2"><span class="token property">"task"</span><span class="token operator">:</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(248, 248, 242)">{</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"> </span><span class="token property">"taskTypeConfigsMap"</span><span class="token operator">:</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(248, 248, 242)">{</span><span class="token plain"> </span><span class="token property">"SegmentGenerationAndPushTask"</span><span class="token operator">:</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(248, 248, 242)">{</span><span class="token punctuation" style="color:rgb(248, 248, 242)">}</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(248, 248, 242)">}</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token punctuation" style="color:rgb(248, 248, 242)">}</span></span></code></pre><button type="button" aria-label="Copy code to clipboard" class="copyButton_M3SB clean-btn">Copy</button></div></div><p>As long as you’ve done those two things, we’re ready to write our import query! A query that imports JSON files from my S3 bucket is shown below:</p><div class="codeBlockContainer_J+bg"><div class="codeBlockContent_csEI sql"><pre tabindex="0" class="prism-code language-sql codeBlock_rtdJ thin-scrollbar" style="color:#F8F8F2;background-color:#282A36"><code class="codeBlockLines_1zSZ"><span class="token-line" style="color:#F8F8F2"><span class="token keyword" style="color:rgb(189, 147, 249);font-style:italic">INSERT</span><span class="token plain"> </span><span class="token keyword" style="color:rgb(189, 147, 249);font-style:italic">INTO</span><span class="token plain"> events</span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token keyword" style="color:rgb(189, 147, 249);font-style:italic">FROM</span><span class="token plain"> </span><span class="token keyword" style="color:rgb(189, 147, 249);font-style:italic">FILE</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">'s3://marks-st-cloud-bucket/events/'</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token keyword" style="color:rgb(189, 147, 249);font-style:italic">OPTION</span><span class="token punctuation" style="color:rgb(248, 248, 242)">(</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"> taskName</span><span class="token operator">=</span><span class="token plain">events</span><span class="token operator">-</span><span class="token plain">task</span><span class="token punctuation" style="color:rgb(248, 248, 242)">,</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"> includeFileNamePattern</span><span class="token operator">=</span><span class="token plain">glob:</span><span class="token operator">*</span><span class="token operator">*</span><span class="token operator">/</span><span class="token operator">*</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">json</span><span class="token punctuation" style="color:rgb(248, 248, 242)">,</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"> input</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">fs</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">className</span><span class="token operator">=</span><span class="token plain">org</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">apache</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">pinot</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">plugin</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">filesystem</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">S3PinotFS</span><span class="token punctuation" style="color:rgb(248, 248, 242)">,</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"> input</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">fs</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">prop</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">accessKey</span><span class="token operator">=</span><span class="token plain">AKIARCOCT6DWLUB7F77Z</span><span class="token punctuation" style="color:rgb(248, 248, 242)">,</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"> input</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">fs</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">prop</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">secretKey</span><span class="token operator">=</span><span class="token plain">gfz71RX</span><span class="token operator">+</span><span class="token plain">Tj4udve43YePCBqMsIeN1PvHXrVFyxJS</span><span class="token punctuation" style="color:rgb(248, 248, 242)">,</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"> input</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">fs</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">prop</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">region</span><span class="token operator">=</span><span class="token plain">eu</span><span class="token operator">-</span><span class="token plain">west</span><span class="token operator">-</span><span class="token number">2</span><span class="token plain"></span></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token punctuation" style="color:rgb(248, 248, 242)">)</span><span class="token punctuation" style="color:rgb(248, 248, 242)">;</span></span></code></pre><button type="button" aria-label="Copy code to clipboard" class="copyButton_M3SB clean-btn">Copy</button></div></div><p>If we run this query, we’ll see the following output:</p><p><img src="https://www.datocms-assets.com/75153/1668701654-image5.png" alt="Sample events_OFFLINE query result" title="Sample events_OFFLINE query result"></p><p>We can check on the state of the ingestion job via the Swagger REST API. If we navigate to <a href="http://localhost:9000/help#/Task/getTaskState" target="_blank" rel="noopener noreferrer">http://localhost:9000/help#/Task/getTaskState</a>, paste Task_SegmentGenerationAndPushTask_events-task as our task name, and then click Execute, we’ll see the following:</p><p><img src="https://www.datocms-assets.com/75153/1668701727-image2.png" alt="Checking the state of an ingestion job screen" title="Checking the state of an ingestion job screen"></p><p>If we see the state COMPLETED, this means the data has been ingested, which we can check by going back to the Query console and clicking on the events table.</p><h2><a aria-hidden="true" tabindex="-1" class="anchor" id="summary"></a>Summary<a class="hash-link" href="#summary" title="Direct link to heading">#</a></h2><p>I have to say that batch ingestion of data into Apache Pinot has always felt a bit clunky, but with this new clause, it’s super easy, and it’s gonna save us all a bunch of time.</p><p>Also, anything that means I’m not writing YAML files has got to be a good thing!</p><p>So give it a try and let us know how you get on. If you have any questions about this feature, feel free to join us on <a href="https://stree.ai/slack" target="_blank" rel="noopener noreferrer">Slack</a>, where we’ll be happy to help you out.</p></div><footer class="row docusaurus-mt-lg blogPostDetailsFull_xD8n"><div class="col"><b>Tags:</b><a class="margin-horiz--sm" href="/blog/tags/pinot">Pinot</a><a class="margin-horiz--sm" href="/blog/tags/data">Data</a><a class="margin-horiz--sm" href="/blog/tags/analytics">Analytics</a><a class="margin-horiz--sm" href="/blog/tags/user-facing-analytics">User-Facing Analytics</a><a class="margin-horiz--sm" href="/blog/tags/insert">Insert</a></div><div class="col margin-top--sm"><a href="https://github.com/apache/pinot-site/edit/dev/website/blog/2022-11-17-Apache Pinot-Inserts-from-SQL.md" target="_blank" rel="noreferrer noopener"><svg fill="currentColor" height="20" width="20" viewBox="0 0 40 40" class="iconEdit_mS5F" aria-hidden="true"><g><path d="m34.5 11.7l-3 3.1-6.3-6.3 3.1-3q0.5-0.5 1.2-0.5t1.1 0.5l3.9 3.9q0.5 0.4 0.5 1.1t-0.5 1.2z m-29.5 17.1l18.4-18.5 6.3 6.3-18.4 18.4h-6.3v-6.2z"></path></g></svg>Edit this page</a></div></footer></article><nav class="pagination-nav docusaurus-mt-lg" aria-label="Blog post page navigation"><div class="pagination-nav__item"><a class="pagination-nav__link" href="/blog/2022/11/22/Apache-Pinot-Timestamp-Indexes"><div class="pagination-nav__sublabel">Newer Post</div><div class="pagination-nav__label">« Apache Pinot™ 0.11 - Timestamp Indexes</div></a></div><div class="pagination-nav__item pagination-nav__item--next"><a class="pagination-nav__link" href="/blog/2022/11/08/Apache Pinot-How-do-I-see-my-indexes"><div class="pagination-nav__sublabel">Older Post</div><div class="pagination-nav__label">Apache Pinot™ 0.11 - How do I see my indexes? »</div></a></div></nav></main><div class="col col--2"><div class="tableOfContents_vrFS thin-scrollbar"><ul class="table-of-contents table-of-contents__left-border"><li><a href="#batch-importing-the-job-specification" class="table-of-contents__link">Batch importing: The Job Specification</a></li><li><a href="#batch-importing-with-sql" class="table-of-contents__link">Batch Importing with SQL</a></li><li><a href="#summary" class="table-of-contents__link">Summary</a></li></ul></div></div></div></div></div><footer class="footer"><div class="container"><div class="row footer__links"><div class="col footer__col"><h4 class="footer__title">About</h4><ul class="footer__items"><li class="footer__item"><a href="https://docs.pinot.apache.org/" target="_blank" rel="noopener noreferrer" class="footer__link-item">What is Apache Pinot?</a></li><li class="footer__item"><a class="footer__link-item" href="/who_uses">Who uses Apache Pinot?</a></li><li class="footer__item"><a href="https://docs.pinot.apache.org/pinot-components" target="_blank" rel="noopener noreferrer" class="footer__link-item">Components</a></li><li class="footer__item"><a href="https://docs.pinot.apache.org/basics/architecture" target="_blank" rel="noopener noreferrer" class="footer__link-item">Architecture</a></li><li class="footer__item"><a href="https://docs.pinot.apache.org/plugins/plugin-architecture" target="_blank" rel="noopener noreferrer" class="footer__link-item">Plugins Architecture</a></li></ul></div><div class="col footer__col"><h4 class="footer__title">Integrations</h4><ul class="footer__items"><li class="footer__item"><a href="https://docs.pinot.apache.org/integrations/trino" target="_blank" rel="noopener noreferrer" class="footer__link-item">Trino</a></li><li class="footer__item"><a href="https://docs.pinot.apache.org/integrations/presto" target="_blank" rel="noopener noreferrer" class="footer__link-item">Presto</a></li><li class="footer__item"><a href="https://docs.pinot.apache.org/integrations/superset" target="_blank" rel="noopener noreferrer" class="footer__link-item">Superset</a></li><li class="footer__item"><a href="https://docs.pinot.apache.org/integrations/thirdeye" target="_blank" rel="noopener noreferrer" class="footer__link-item">ThirdEye</a></li></ul></div><div class="col footer__col"><h4 class="footer__title">Docs</h4><ul class="footer__items"><li class="footer__item"><a href="https://docs.pinot.apache.org/getting-started" target="_blank" rel="noopener noreferrer" class="footer__link-item">Getting Started</a></li><li class="footer__item"><a href="https://docs.pinot.apache.org/pinot-components" target="_blank" rel="noopener noreferrer" class="footer__link-item">Pinot Components</a></li><li class="footer__item"><a href="https://docs.pinot.apache.org/users" target="_blank" rel="noopener noreferrer" class="footer__link-item">User Guide</a></li><li class="footer__item"><a href="https://docs.pinot.apache.org/operators/operating-pinot" target="_blank" rel="noopener noreferrer" class="footer__link-item">Administration</a></li></ul></div><div class="col footer__col"><h4 class="footer__title">Community</h4><ul class="footer__items"><li class="footer__item"><a href="https://join.slack.com/t/apache-pinot/shared_invite/zt-5z7pav2f-yYtjZdVA~EDmrGkho87Vzw" target="_blank" rel="noopener noreferrer" class="footer__link-item">Slack</a></li><li class="footer__item"><a href="https://github.com/apache/pinot" target="_blank" rel="noopener noreferrer" class="footer__link-item">Github</a></li><li class="footer__item"><a href="https://twitter.com/ApachePinot" target="_blank" rel="noopener noreferrer" class="footer__link-item">Twitter</a></li><li class="footer__item"><a href="mailto:dev-subscribe@pinot.apache.org?Subject=SubscribeToPinot" target="_blank" rel="noopener noreferrer" class="footer__link-item">Mailing List</a></li></ul></div><div class="col footer__col"><h4 class="footer__title">Apache</h4><ul class="footer__items"><li class="footer__item"><a href="https://www.apache.org/events/current-event" target="_blank" rel="noopener noreferrer" class="footer__link-item">Events</a></li><li class="footer__item"><a href="https://www.apache.org/foundation/thanks.html" target="_blank" rel="noopener noreferrer" class="footer__link-item">Thanks</a></li><li class="footer__item"><a href="https://www.apache.org/licenses" target="_blank" rel="noopener noreferrer" class="footer__link-item">License</a></li><li class="footer__item"><a href="https://www.apache.org/security" target="_blank" rel="noopener noreferrer" class="footer__link-item">Security</a></li><li class="footer__item"><a href="https://www.apache.org/foundation/sponsorship.html" target="_blank" rel="noopener noreferrer" class="footer__link-item">Sponsorship</a></li><li class="footer__item"><a href="https://www.apache.org" target="_blank" rel="noopener noreferrer" class="footer__link-item">Foundation</a></li></ul></div></div><div class="footer__bottom text--center"><div class="margin-bottom--sm"><a href="https://pinot.apache.org/" target="_blank" rel="noopener noreferrer" class="footerLogoLink_94kH"><img src="/img/logo.svg" alt="Apache Pinot™" class="themedImage_TMUO themedImage--light_4Vu1 footer__logo"><img src="/img/logo.svg" alt="Apache Pinot™" class="themedImage_TMUO themedImage--dark_uzRr footer__logo"></a></div><div class="footerCopyright_-piB">Copyright © 2024 The Apache Software Foundation.<br>Apache Pinot, Pinot, Apache, the Apache feather logo, and the Apache Pinot project logo are registered trademarks of The Apache Software Foundation.<br><br>This page has references to third party software - Presto, PrestoDB, ThirdEye, Trino, TrinoDB, that are not part of the Apache Software Foundation and are not covered under the Apache License.</div></div></div></footer></div> |