<!DOCTYPE html><html lang="en-us" class="__variable_1fc36d scroll-smooth"><head><meta charSet="utf-8"/><meta name="viewport" content="width=device-width, initial-scale=1"/><link rel="preload" href="/_next/static/media/6905431624c34d00-s.p.woff2" as="font" crossorigin="" type="font/woff2"/><link rel="preload" as="image" href="https://miro.medium.com/v2/resize:fit:4800/0*gEyUq_Tp4Fycgp2r"/><link rel="preload" as="image" href="https://miro.medium.com/v2/resize:fit:1064/0*GE3P1fqAcsr0Xs5A"/><link rel="preload" as="image" href="https://miro.medium.com/v2/resize:fit:2000/0*SMxDZNndFwpoeNMI"/><link rel="preload" as="image" href="https://miro.medium.com/v2/resize:fit:1068/0*LB1itt-wCohpz42i"/><link rel="preload" as="image" href="https://miro.medium.com/v2/resize:fit:2000/0*cmx4zxoMsD4-tR_u"/><link rel="stylesheet" href="/_next/static/css/9e925a33b1acdac1.css" crossorigin="" data-precedence="next"/><link rel="stylesheet" href="/_next/static/css/c130d1629644f070.css" crossorigin="" data-precedence="next"/><link rel="preload" as="script" fetchPriority="low" href="/_next/static/chunks/webpack-dde39ac7c1b4eb4b.js" crossorigin=""/><script src="/_next/static/chunks/fd9d1056-f172993f20f9bb67.js" async="" crossorigin=""></script><script src="/_next/static/chunks/472-928e738895d89765.js" async="" crossorigin=""></script><script src="/_next/static/chunks/main-app-344a87763a0866a6.js" async="" crossorigin=""></script><script src="/_next/static/chunks/326-3a90a6443b9c824c.js" async=""></script><script src="/_next/static/chunks/980-6e243f9cd384c7d2.js" async=""></script><script src="/_next/static/chunks/702-a2bf9fe707814b79.js" async=""></script><script src="/_next/static/chunks/app/layout-776a485845c720ef.js" async=""></script><script src="/_next/static/chunks/413-f9f40b83f7bb3f22.js" async=""></script><script src="/_next/static/chunks/app/blog/%5B...slug%5D/page-502e08b6677b55da.js" async=""></script><link rel="preload" href="https://www.googletagmanager.com/gtag/js?id=G-ZXG79NJEBY" as="script"/><title>Segment Compaction for Upsert Enabled Tables in Apache Pinot | Apache Pinot™</title><meta name="description" content="The blog post discusses the feature contribution of segment compaction to Apache Pinot project, addressing the issue of older records consuming unnecessary storage space. It explains the configuration and impact of segment compaction in freeing up storage. The author expresses gratitude and offers support for questions or feedback on segment compaction."/><meta name="robots" content="index, follow"/><meta name="googlebot" content="index, follow, max-video-preview:-1, max-image-preview:large, max-snippet:-1"/><link rel="canonical" href="https://pinot.apache.org/blog/2023/08/04/segment-compaction-for-upsert-enabled-tables-in-apache-pinot-3f30657aa077"/><link rel="alternate" type="application/rss+xml" href="https://pinot.apache.org/feed.xml"/><meta property="og:title" content="Segment Compaction for Upsert Enabled Tables in Apache Pinot"/><meta property="og:description" content="The blog post discusses the feature contribution of segment compaction to Apache Pinot project, addressing the issue of older records consuming unnecessary storage space. It explains the configuration and impact of segment compaction in freeing up storage. The author expresses gratitude and offers support for questions or feedback on segment compaction."/><meta property="og:url" content="https://pinot.apache.org/blog/2023/08/04/segment-compaction-for-upsert-enabled-tables-in-apache-pinot-3f30657aa077"/><meta property="og:site_name" content="Apache Pinot™"/><meta property="og:locale" content="en_US"/><meta property="og:image" content="https://pinot.apache.org/static/images/twitter-card.png"/><meta property="og:type" content="article"/><meta property="article:published_time" content="2023-08-04T00:00:00.000Z"/><meta property="article:modified_time" content="2023-08-04T00:00:00.000Z"/><meta property="article:author" content="Robert Zych"/><meta name="twitter:card" content="summary_large_image"/><meta name="twitter:title" content="Segment Compaction for Upsert Enabled Tables in Apache Pinot"/><meta name="twitter:description" content="The blog post discusses the feature contribution of segment compaction to Apache Pinot project, addressing the issue of older records consuming unnecessary storage space. It explains the configuration and impact of segment compaction in freeing up storage. The author expresses gratitude and offers support for questions or feedback on segment compaction."/><meta name="twitter:image" content="https://pinot.apache.org/static/images/twitter-card.png"/><meta name="next-size-adjust"/><meta http-equiv="Content-Security-Policy" content="default-src &#x27;self&#x27;;script-src &#x27;self&#x27; &#x27;unsafe-eval&#x27; &#x27;unsafe-inline&#x27; giscus.app analytics.umami.is www.youtube.com www.googletagmanager.com www.google-analytics.com;style-src &#x27;self&#x27; &#x27;unsafe-inline&#x27;;img-src * blob: data:;media-src *.s3.amazonaws.com;connect-src *;font-src &#x27;self&#x27;;frame-src www.youtube.com youtube.com giscus.app youtu.be https://www.youtube.com https://youtube.com;"/><link rel="apple-touch-icon" sizes="76x76" href="/static/favicons/apple-touch-icon.png"/><link rel="icon" type="image/png" sizes="32x32" href="/static/favicons/favicon-32x32.png"/><link rel="icon" type="image/png" sizes="16x16" href="/static/favicons/favicon-16x16.png"/><link rel="manifest" href="/static/favicons/site.webmanifest"/><link rel="mask-icon" href="/static/favicons/safari-pinned-tab.svg" color="#5bbad5"/><meta name="msapplication-TileColor" content="#000000"/><meta name="theme-color" media="(prefers-color-scheme: light)" content="#fff"/><meta name="theme-color" media="(prefers-color-scheme: dark)" content="#000"/><link rel="alternate" type="application/rss+xml" href="/feed.xml"/><script src="/_next/static/chunks/polyfills-c67a75d1b6f99dc8.js" crossorigin="" noModule=""></script></head><body class="bg-white text-black antialiased dark:bg-gray-950 dark:text-white"><script>!function(){try{var d=document.documentElement,c=d.classList;c.remove('light','dark');var e=localStorage.getItem('theme');if('system'===e||(!e&&false)){var t='(prefers-color-scheme: dark)',m=window.matchMedia(t);if(m.media!==t||m.matches){d.style.colorScheme = 'dark';c.add('dark')}else{d.style.colorScheme = 'light';c.add('light')}}else if(e){c.add(e|| '')}else{c.add('light')}if(e==='light'||e==='dark'||!e)d.style.colorScheme=e||'light'}catch(e){}}()</script><div class="mx-auto flex max-w-screen-customDesktop flex-col justify-between font-sans"><div class="inset-x-0 top-0 z-50 flex text-center text-base sm:text-left"><div class="flex w-full flex-col items-center justify-center bg-sky-200 pt-1 md:flex-row md:pt-0"><div class="flex flex-wrap items-center justify-center md:justify-start">🎉🎉🎉 Announcing the release of Apache Pinot 1.1.0</div><div class="flex items-center justify-center"><a href="https://github.com/apache/pinot/releases/tag/release-1.1.0" target="_blank" class="inline-flex items-center justify-center whitespace-nowrap rounded-md ring-offset-background transition-colors focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-ring focus-visible:ring-offset-2 disabled:pointer-events-none disabled:opacity-50 underline-offset-4 hover:text-vine-120 h-10 px-4 py-2 mr-2 text-base font-semibold leading-tight text-vine-100">learn more<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-arrow-right mr-2 h-5 w-5"><path d="M5 12h14"></path><path d="m12 5 7 7-7 7"></path></svg></a></div></div></div><header class="border-b-1 flex items-center justify-between border-b px-5 py-3 md:px-[4rem] md:py-4"><div class="flex"><a aria-label="" href="/"><div class="flex items-center justify-between"><div class=""><svg xmlns="http://www.w3.org/2000/svg" width="120" height="48" fill="none"><g fill="#C7154A" clip-path="url(#logo_svg__a)"><path d="M42.99 18.448c1.032-.553 2.21-.831 3.535-.831 1.542 0 2.938.38 4.187 1.14 1.248.76 2.236 1.841 2.965 3.241.728 1.402 1.091 3.025 1.091 4.872s-.363 3.482-1.091 4.903c-.729 1.424-1.717 2.525-2.965 3.307-1.25.782-2.645 1.173-4.187 1.173-1.325 0-2.493-.271-3.503-.815-1.01-.543-1.83-1.226-2.46-2.053v14.612H36V17.912h4.562v2.606c.586-.825 1.395-1.515 2.426-2.068l.002-.002m6.452 5.605c-.445-.793-1.032-1.395-1.76-1.808a4.72 4.72 0 0 0-2.362-.618c-.847 0-1.602.211-2.33.635-.728.423-1.315 1.038-1.76 1.841-.445.804-.668 1.749-.668 2.835 0 1.087.221 2.032.668 2.835.445.804 1.032 1.417 1.76 1.842a4.557 4.557 0 0 0 2.33.635 4.57 4.57 0 0 0 2.362-.652c.728-.435 1.313-1.053 1.76-1.856.445-.804.668-1.76.668-2.867s-.223-2.025-.668-2.818v-.004M62.947 17.912v18.051h-4.562V17.912h4.562m.551-6.079a2.833 2.833 0 1 1-5.666 0 2.833 2.833 0 0 1 5.666 0M82.954 19.687c1.325 1.358 1.988 3.253 1.988 5.685v10.59H80.38v-9.97c0-1.434-.358-2.537-1.075-3.307-.717-.772-1.695-1.157-2.933-1.157-1.239 0-2.254.387-2.982 1.157-.728.772-1.091 1.873-1.091 3.307v9.97h-4.562V17.91h4.562v2.248a6.322 6.322 0 0 1 2.33-1.841c.944-.445 1.981-.669 3.111-.669 2.15 0 3.889.68 5.214 2.037v.002M92.892 35.098c-1.39-.77-2.482-1.861-3.275-3.275-.794-1.411-1.19-3.041-1.19-4.888s.406-3.475 1.221-4.888a8.502 8.502 0 0 1 3.34-3.275c1.412-.772 2.987-1.157 4.725-1.157 1.739 0 3.312.387 4.725 1.157a8.5 8.5 0 0 1 3.34 3.275c.815 1.411 1.222 3.041 1.222 4.888s-.418 3.475-1.255 4.888a8.708 8.708 0 0 1-3.388 3.275c-1.424.772-3.014 1.157-4.774 1.157-1.76 0-3.301-.385-4.691-1.157m7.021-3.421c.729-.402 1.309-1.005 1.744-1.809.435-.803.651-1.781.651-2.933 0-1.715-.451-3.035-1.351-3.958-.902-.924-2.004-1.385-3.307-1.385s-2.395.461-3.275 1.385c-.88.923-1.32 2.243-1.32 3.958 0 1.715.428 3.035 1.287 3.958.858.924 1.938 1.385 3.241 1.385.825 0 1.602-.2 2.33-.603v.002M115.96 21.658v8.734c0 .608.147 1.048.44 1.32.293.271.787.406 1.482.406H120v3.845h-2.867c-3.845 0-5.766-1.868-5.766-5.605v-8.7h-2.15v-3.746h2.15V13l4.595-1v5.912h4.04v3.746h-4.042M20.03 46.757l-5.538-1.385A1.97 1.97 0 0 1 13 43.46v-5.462c0-.841.349-1.601.907-2.146a12.212 12.212 0 0 0 6.975-3.644c2.602-2.731 3.627-6.578 2.882-10.251L21 9h-4V4a1 1 0 0 0-2 0v7a1 1 0 0 1-2 0v-1a1 1 0 0 0-2 0v6.758a4.489 4.489 0 0 1 2.694-.755c2.278.095 4.156 1.934 4.297 4.21a4.501 4.501 0 0 1-6.992 4.029V29a1 1 0 0 1-2 0V7a1 1 0 0 0-2 0v2h-4L.237 21.957c-.745 3.675.279 7.52 2.882 10.251a12.202 12.202 0 0 0 6.975 3.644c.558.545.907 1.305.907 2.146V43.4c0 .938-.639 1.757-1.55 1.985l-5.48 1.37c-.57.143-.97.655-.97 1.243h18c0-.588-.4-1.1-.97-1.243v.002"></path><path d="M13.5 23a2.5 2.5 0 1 0 0-5 2.5 2.5 0 0 0 0 5M8 5a1 1 0 1 0 0-2 1 1 0 0 0 0 2M12 8a1 1 0 1 0 0-2 1 1 0 0 0 0 2M16 2a1 1 0 1 0 0-2 1 1 0 0 0 0 2"></path></g><defs><clipPath id="logo_svg__a"><path fill="#fff" d="M0 0h120v48H0z"></path></clipPath></defs></svg></div><div class="hidden h-6 text-2xl font-semibold sm:block"></div></div></a><div class="ml-[4.5rem] flex items-center gap-12 text-lg leading-5"><a target="_blank" rel="noopener noreferrer" href="https://docs.pinot.apache.org" class="hidden sm:block font-medium text-gray-900 dark:text-gray-100">Docs</a><a class="hidden sm:block font-medium text-gray-900 dark:text-gray-100" href="/download">Download</a><a class="hidden sm:block font-medium text-gray-900 dark:text-gray-100" href="/powered-by">Powered by</a><a class="hidden sm:block font-medium text-gray-900 dark:text-gray-100" href="/blog">Blog</a></div></div><button aria-label="Toggle Menu" class="sm:hidden"><svg xmlns="http://www.w3.org/2000/svg" width="32" height="32" fill="none"><mask id="menu_svg__a" width="32" height="32" x="0" y="0" maskUnits="userSpaceOnUse" style="mask-type:alpha"><path fill="#D9D9D9" d="M0 0h32v32H0z"></path></mask><g mask="url(#menu_svg__a)"><path fill="#201F1F" d="M4.667 23.513v-2h22.666v2H4.667Zm0-6.513v-2h22.666v2H4.667Zm0-6.513v-2h22.666v2H4.667Z"></path></g></svg></button><div class="hidden gap-3 sm:flex"><button aria-label="Search"><svg xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24" stroke-width="1.5" stroke="currentColor" class="h-6 w-6 text-gray-900 dark:text-gray-100"><path stroke-linecap="round" stroke-linejoin="round" d="M21 21l-5.197-5.197m0 0A7.5 7.5 0 105.196 5.196a7.5 7.5 0 0010.607 10.607z"></path></svg></button><a target="_blank" rel="noopener noreferrer" href="https://github.com/apache/pinot" class="inline-flex items-center justify-center whitespace-nowrap font-medium ring-offset-background transition-colors focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-ring focus-visible:ring-offset-2 disabled:pointer-events-none disabled:opacity-50 border border-input bg-background hover:bg-accent hover:text-accent-foreground rounded-md px-3 py-2 text-base"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" fill="currentColor" class="mr-2"><g clip-path="url(#github_svg__a)"><path fill-rule="evenodd" d="M12.01 0C5.369 0 0 5.5 0 12.304c0 5.44 3.44 10.043 8.212 11.673.597.122.815-.265.815-.59 0-.286-.02-1.264-.02-2.283-3.34.734-4.036-1.466-4.036-1.466-.537-1.426-1.332-1.793-1.332-1.793-1.094-.754.08-.754.08-.754 1.212.082 1.849 1.263 1.849 1.263 1.073 1.874 2.803 1.345 3.5 1.019.098-.795.417-1.345.755-1.65-2.665-.285-5.468-1.345-5.468-6.07 0-1.345.477-2.445 1.232-3.3-.119-.306-.537-1.57.12-3.26 0 0 1.014-.326 3.3 1.263.98-.27 1.989-.407 3.003-.408 1.014 0 2.048.143 3.002.408 2.287-1.59 3.301-1.263 3.301-1.263.657 1.69.239 2.954.12 3.26.775.855 1.232 1.955 1.232 3.3 0 4.725-2.803 5.764-5.488 6.07.438.387.815 1.12.815 2.281 0 1.65-.02 2.975-.02 3.382 0 .326.22.713.816.59C20.56 22.347 24 17.744 24 12.305 24.02 5.5 18.63 0 12.01 0" clip-rule="evenodd"></path></g><defs><clipPath id="github_svg__a"><path fill="#fff" d="M0 0h24v24H0z"></path></clipPath></defs></svg>3.5k</a><button class="inline-flex items-center justify-center whitespace-nowrap font-medium ring-offset-background transition-colors focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-ring focus-visible:ring-offset-2 disabled:pointer-events-none disabled:opacity-50 text-primary-foreground hover:bg-vine-120 rounded-md bg-vine-100 px-6 py-2 text-base"><a target="_blank" rel="noopener noreferrer" href="https://docs.pinot.apache.org/basics/getting-started">Get Started</a></button></div></header><main><script type="application/ld+json">{"@context":"https://schema.org","@type":"BlogPosting","headline":"Segment Compaction for Upsert Enabled Tables in Apache Pinot","datePublished":"2023-08-04T00:00:00.000Z","dateModified":"2023-08-04T00:00:00.000Z","description":"The blog post discusses the feature contribution of segment compaction to Apache Pinot project, addressing the issue of older records consuming unnecessary storage space. It explains the configuration and impact of segment compaction in freeing up storage. The author expresses gratitude and offers support for questions or feedback on segment compaction.","image":"/static/images/twitter-card.png","url":"https://pinot.apache.org/blog/2023-08-04-segment-compaction-for-upsert-enabled-tables-in-apache-pinot-3f30657aa077","author":[{"@type":"Person","name":"Robert Zych"}]}</script><section class=" px-5 pt-10 md:px-[13.313rem] md:py-16"><div class="fixed bottom-8 right-8 hidden flex-col gap-3 md:hidden"><button aria-label="Scroll To Top" class="rounded-full bg-gray-200 p-2 text-gray-500 transition-all hover:bg-gray-300 dark:bg-gray-700 dark:text-gray-400 dark:hover:bg-gray-600"><svg class="h-5 w-5" viewBox="0 0 20 20" fill="currentColor"><path fill-rule="evenodd" d="M3.293 9.707a1 1 0 010-1.414l6-6a1 1 0 011.414 0l6 6a1 1 0 01-1.414 1.414L11 5.414V17a1 1 0 11-2 0V5.414L4.707 9.707a1 1 0 01-1.414 0z" clip-rule="evenodd"></path></svg></button></div><article class=""><div class="mx-auto lg:flex"><div class="lg:pr-12"><header class="pt-6 md:pr-10"><h1 class="text-4xl font-semibold">Segment Compaction for Upsert Enabled Tables in Apache Pinot</h1><p class="pt-2 text-lg">By: <!-- -->Robert Zych</p><p class="py-2 text-sm">August 4th, 2023<!-- --> • <!-- -->4 min read</p></header><div class="flex flex-col lg:flex-row"><main class=""><div class="prose max-w-[45rem] pb-8 pt-10 dark:prose-invert"><p>I’m happy to share that my 1st feature contribution to the Apache Pinot project (<a target="_blank" rel="noopener noreferrer" href="https://github.com/apache/pinot/pull/10463">Segment compaction for upsert enabled real-time tables</a>) was merged recently! In this post, I will briefly discuss the problem segment compaction addresses, how to configure it, and what it looks like in action. If you’re unfamiliar with Pinot’s Upsert features, I recommend reviewing <a target="_blank" rel="noopener noreferrer" href="https://dev.startree.ai/docs/pinot/recipes/upserts-full">Full Upserts in Pinot</a> to get started and <a target="_blank" rel="noopener noreferrer" href="https://docs.pinot.apache.org/basics/data-import/upsert">Stream Ingestion with Upsert</a> for more information.</p><h2 id="context-and-configuration"><a href="#context-and-configuration" aria-hidden="true" tabindex="-1"><span class="icon icon-link"></span></a>Context and Configuration</h2><p>As Pinot’s Upsert stores all versions of the record ingested into immutable segments on disk, older records unnecessarily consume valuable storage space when they’re no longer used in query results. Pinot’s Segment Compaction reclaims this valuable storage space by introducing a periodic process that replaces the completed segments with compacted segments which only contain the latest version of the records. I recommend reviewing the Minion documentation if you’re unfamiliar with Pinot’s ability to run periodic processes.</p><p>With task scheduling enabled and an available Minion, you can configure segment compaction by adding the following to your table’s config.</p><div class="relative"><pre><code class="language-json code-highlight"><span class="code-line"><span class="token property">&quot;task&quot;</span><span class="token operator">:</span> <span class="token punctuation">{</span>
</span><span class="code-line">  <span class="token property">&quot;taskTypeConfigsMap&quot;</span><span class="token operator">:</span> <span class="token punctuation">{</span>
</span><span class="code-line">    <span class="token property">&quot;UpsertCompactionTask&quot;</span><span class="token operator">:</span> <span class="token punctuation">{</span>
</span><span class="code-line">      <span class="token property">&quot;schedule&quot;</span><span class="token operator">:</span> <span class="token string">&quot;0 */5 * ? * *&quot;</span><span class="token punctuation">,</span>
</span><span class="code-line">      <span class="token property">&quot;bufferTimePeriod&quot;</span><span class="token operator">:</span> <span class="token string">&quot;7d&quot;</span><span class="token punctuation">,</span>
</span><span class="code-line">      <span class="token property">&quot;invalidRecordsThresholdPercent&quot;</span><span class="token operator">:</span> <span class="token string">&quot;30&quot;</span><span class="token punctuation">,</span>
</span><span class="code-line">      <span class="token property">&quot;invalidRecordsThresholdCount&quot;</span><span class="token operator">:</span> <span class="token string">&quot;100000&quot;</span>
</span><span class="code-line">    <span class="token punctuation">}</span>
</span><span class="code-line">  <span class="token punctuation">}</span>
</span><span class="code-line"><span class="token punctuation">}</span>
</span></code></pre></div><p>All the configs above (excluding schedule) determine which completed segments are selected for compaction.</p><p>bufferTimePeriod is the amount of time that has elapsed since the segment was consuming. In the example above, this has been set to “7d” which means that any segment that was completed over 7 days ago may be eligible for compaction. However, if you want to ensure that segments are compacted without any additional delay this config can be set to “0d”.</p><p>invalidRecordsThresholdPercent is a limit to the amount of older records allowed in the completed segment represented as a percentage of the total number of records in the segment (i.e. old records / total records). In the example above, this has been set to “30” which means that if more than 30% of the records in the completed segment are old, then the segment may be selected for compaction. As segment compaction is an expensive operation, it is not recommended to set this config (or invalidRecordsThresholdCount) too close to 1. This config is optional on the condition that invalidRecordsThresholdCount has been set and can be used in conjunction with invalidRecordsThresholdCount.</p><p>invalidRecordsThresholdCount is also a limit similar to invalidRecordsThresholdPercent, but allows you to express the threshold as a record count. In the example above, this has been set to “100000” which means that if the segment contains more than 100K records then it may be selected for compaction.</p><h2 id="example-use-case"><a href="#example-use-case" aria-hidden="true" tabindex="-1"><span class="icon icon-link"></span></a>Example Use Case</h2><p>I’ve created a data set that includes 24M records. The data set contains 240K unique keys that have each been duplicated 100 times.</p><p><img alt="alt" src="https://miro.medium.com/v2/resize:fit:4800/0*gEyUq_Tp4Fycgp2r"/></p><p>After ingesting the data set there are 6 segments (5 completed segments + 1 consuming segment) with a total estimated size of 22.8MB. Submitting the query “set skipUpsert=true; select count(*) from transcript_upsert” before compaction produces the following query result.</p><p><img alt="alt" src="https://miro.medium.com/v2/resize:fit:1064/0*GE3P1fqAcsr0Xs5A"/></p><p>After the compaction tasks are complete, the Minion Task Manager UI reports the following.</p><p><img alt="alt" src="https://miro.medium.com/v2/resize:fit:2000/0*SMxDZNndFwpoeNMI"/></p><p>Segment compaction generates a task for each segment to be compacted. 5 tasks were generated in this case because 90% of the records (3.6–4.5M records) are old in all 5 of the completed segments, therefore exceeding the configured thresholds. If a completed segment only contains old records, it is deleted immediately and a task isn’t generated to compact it.</p><p><img alt="alt" src="https://miro.medium.com/v2/resize:fit:1068/0*LB1itt-wCohpz42i"/></p><p>Submitting the query again we now see that count matches the set of 240K unique keys.</p><p><img alt="alt" src="https://miro.medium.com/v2/resize:fit:2000/0*cmx4zxoMsD4-tR_u"/></p><p>Once compaction has completed and the original segments have been replaced with their compacted counterparts we see that the total number of segments remained the same, but the total estimated size dropped to only 2.77MB! Since compaction can yield very small segments, one improvement would be to merge smaller segments into larger ones as this would improve query latency.</p><h2 id="conclusion"><a href="#conclusion" aria-hidden="true" tabindex="-1"><span class="icon icon-link"></span></a>Conclusion</h2><p>In this brief overview of Segment Compaction I covered the problem it addresses, how you can configure it, and demonstrated its ability to reclaim storage space. I’d like to thank Ankit Sultana, Seunghyun Lee, and especially Jackie Jiang for their feedback and support throughout the design and development stages. If you have any questions or feedback, I’m available on the Apache Pinot Slack.</p></div></main></div></div><aside class="mt-10 hidden border-l-2 pl-5 lg:sticky lg:top-1 lg:block lg:h-full"><section class="sticky top-0 mb-4 w-[15.375rem]"><div class="flex flex-col space-y-1.5 pb-3"><h3 class="text-sm font-semibold leading-snug text-neutral-500 dark:text-neutral-100">Table of Contents</h3></div><nav class="flex items-center self-start" aria-label="Table of Contents"><ol class="list-none space-y-3"><li class="text-sm leading-tight font-bold"><a href="#context-and-configuration">Context and Configuration</a></li><li class="text-sm font-normal leading-tight"><a href="#example-use-case">Example Use Case</a></li><li class="text-sm font-normal leading-tight"><a href="#conclusion">Conclusion</a></li></ol></nav></section></aside></div></article></section></main><footer class="border-t bg-sky-100 px-5 py-10 md:px-[6.75rem] md:pb-10 md:pt-16"><div class="mx-auto flex max-w-7xl flex-wrap justify-between"><div class="flex-shrink-0"><svg xmlns="http://www.w3.org/2000/svg" width="120" height="48" fill="none"><g fill="#C7154A" clip-path="url(#logo_svg__a)"><path d="M42.99 18.448c1.032-.553 2.21-.831 3.535-.831 1.542 0 2.938.38 4.187 1.14 1.248.76 2.236 1.841 2.965 3.241.728 1.402 1.091 3.025 1.091 4.872s-.363 3.482-1.091 4.903c-.729 1.424-1.717 2.525-2.965 3.307-1.25.782-2.645 1.173-4.187 1.173-1.325 0-2.493-.271-3.503-.815-1.01-.543-1.83-1.226-2.46-2.053v14.612H36V17.912h4.562v2.606c.586-.825 1.395-1.515 2.426-2.068l.002-.002m6.452 5.605c-.445-.793-1.032-1.395-1.76-1.808a4.72 4.72 0 0 0-2.362-.618c-.847 0-1.602.211-2.33.635-.728.423-1.315 1.038-1.76 1.841-.445.804-.668 1.749-.668 2.835 0 1.087.221 2.032.668 2.835.445.804 1.032 1.417 1.76 1.842a4.557 4.557 0 0 0 2.33.635 4.57 4.57 0 0 0 2.362-.652c.728-.435 1.313-1.053 1.76-1.856.445-.804.668-1.76.668-2.867s-.223-2.025-.668-2.818v-.004M62.947 17.912v18.051h-4.562V17.912h4.562m.551-6.079a2.833 2.833 0 1 1-5.666 0 2.833 2.833 0 0 1 5.666 0M82.954 19.687c1.325 1.358 1.988 3.253 1.988 5.685v10.59H80.38v-9.97c0-1.434-.358-2.537-1.075-3.307-.717-.772-1.695-1.157-2.933-1.157-1.239 0-2.254.387-2.982 1.157-.728.772-1.091 1.873-1.091 3.307v9.97h-4.562V17.91h4.562v2.248a6.322 6.322 0 0 1 2.33-1.841c.944-.445 1.981-.669 3.111-.669 2.15 0 3.889.68 5.214 2.037v.002M92.892 35.098c-1.39-.77-2.482-1.861-3.275-3.275-.794-1.411-1.19-3.041-1.19-4.888s.406-3.475 1.221-4.888a8.502 8.502 0 0 1 3.34-3.275c1.412-.772 2.987-1.157 4.725-1.157 1.739 0 3.312.387 4.725 1.157a8.5 8.5 0 0 1 3.34 3.275c.815 1.411 1.222 3.041 1.222 4.888s-.418 3.475-1.255 4.888a8.708 8.708 0 0 1-3.388 3.275c-1.424.772-3.014 1.157-4.774 1.157-1.76 0-3.301-.385-4.691-1.157m7.021-3.421c.729-.402 1.309-1.005 1.744-1.809.435-.803.651-1.781.651-2.933 0-1.715-.451-3.035-1.351-3.958-.902-.924-2.004-1.385-3.307-1.385s-2.395.461-3.275 1.385c-.88.923-1.32 2.243-1.32 3.958 0 1.715.428 3.035 1.287 3.958.858.924 1.938 1.385 3.241 1.385.825 0 1.602-.2 2.33-.603v.002M115.96 21.658v8.734c0 .608.147 1.048.44 1.32.293.271.787.406 1.482.406H120v3.845h-2.867c-3.845 0-5.766-1.868-5.766-5.605v-8.7h-2.15v-3.746h2.15V13l4.595-1v5.912h4.04v3.746h-4.042M20.03 46.757l-5.538-1.385A1.97 1.97 0 0 1 13 43.46v-5.462c0-.841.349-1.601.907-2.146a12.212 12.212 0 0 0 6.975-3.644c2.602-2.731 3.627-6.578 2.882-10.251L21 9h-4V4a1 1 0 0 0-2 0v7a1 1 0 0 1-2 0v-1a1 1 0 0 0-2 0v6.758a4.489 4.489 0 0 1 2.694-.755c2.278.095 4.156 1.934 4.297 4.21a4.501 4.501 0 0 1-6.992 4.029V29a1 1 0 0 1-2 0V7a1 1 0 0 0-2 0v2h-4L.237 21.957c-.745 3.675.279 7.52 2.882 10.251a12.202 12.202 0 0 0 6.975 3.644c.558.545.907 1.305.907 2.146V43.4c0 .938-.639 1.757-1.55 1.985l-5.48 1.37c-.57.143-.97.655-.97 1.243h18c0-.588-.4-1.1-.97-1.243v.002"></path><path d="M13.5 23a2.5 2.5 0 1 0 0-5 2.5 2.5 0 0 0 0 5M8 5a1 1 0 1 0 0-2 1 1 0 0 0 0 2M12 8a1 1 0 1 0 0-2 1 1 0 0 0 0 2M16 2a1 1 0 1 0 0-2 1 1 0 0 0 0 2"></path></g><defs><clipPath id="logo_svg__a"><path fill="#fff" d="M0 0h120v48H0z"></path></clipPath></defs></svg></div><div class="flex flex-wrap gap-x-16 gap-y-5 py-8 md:pl-24 md:pr-[21.625rem]"> <div><h5 class="mb-4 text-lg font-semibold">Resources</h5><div class="flex justify-between gap-x-10"><div class="flex flex-col"><a target="_blank" rel="noopener noreferrer" href="https://docs.pinot.apache.org/" class="block py-1 text-gray-600 hover:text-gray-900">Docs</a><a target="_blank" rel="noopener noreferrer" href="https://docs.pinot.apache.org/getting-started" class="block py-1 text-gray-600 hover:text-gray-900">Getting Started</a><a target="_blank" rel="noopener noreferrer" href="https://docs.pinot.apache.org/integrations/thirdeye" class="block py-1 text-gray-600 hover:text-gray-900">ThirdEye</a></div><div class="flex flex-col"><a class="block py-1 text-gray-600 hover:text-gray-900" href="/powered-by">Company Stories</a><a class="block py-1 text-gray-600 hover:text-gray-900" href="/download">Download</a><a class="block py-1 text-gray-600 hover:text-gray-900" href="/blog">Blog</a></div></div></div><div><h5 class="mb-4 text-lg font-semibold">Apache</h5><div class="flex justify-between gap-x-10"><div class="flex flex-col"><a target="_blank" rel="noopener noreferrer" href="https://www.apache.org" class="block py-1 text-gray-600 hover:text-gray-900">Foundation</a><a target="_blank" rel="noopener noreferrer" href="https://www.apache.org/licenses" class="block py-1 text-gray-600 hover:text-gray-900">License</a><a target="_blank" rel="noopener noreferrer" href="https://www.apache.org/security" class="block py-1 text-gray-600 hover:text-gray-900">Security</a></div><div class="flex flex-col"><a target="_blank" rel="noopener noreferrer" href="https://www.apache.org/foundation/sponsorship.html" class="block py-1 text-gray-600 hover:text-gray-900">Sponsorship</a><a target="_blank" rel="noopener noreferrer" href="https://www.apache.org/events/current-event" class="block py-1 text-gray-600 hover:text-gray-900">Events</a><a target="_blank" rel="noopener noreferrer" href="https://www.apache.org/foundation/thanks.html" class="block py-1 text-gray-600 hover:text-gray-900">Thanks</a></div></div></div></div><div class="mt-4 flex justify-center md:mt-0"><a target="_blank" rel="noopener noreferrer" href="https://join.slack.com/t/apache-pinot/shared_invite/zt-5z7pav2f-yYtjZdVA~EDmrGkho87Vzw" class="mr-4"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-slack fill-gray-900"><rect width="3" height="8" x="13" y="2" rx="1.5"></rect><path d="M19 8.5V10h1.5A1.5 1.5 0 1 0 19 8.5"></path><rect width="3" height="8" x="8" y="14" rx="1.5"></rect><path d="M5 15.5V14H3.5A1.5 1.5 0 1 0 5 15.5"></path><rect width="8" height="3" x="14" y="13" rx="1.5"></rect><path d="M15.5 19H14v1.5a1.5 1.5 0 1 0 1.5-1.5"></path><rect width="8" height="3" x="2" y="8" rx="1.5"></rect><path d="M8.5 5H10V3.5A1.5 1.5 0 1 0 8.5 5"></path></svg></a><a target="_blank" rel="noopener noreferrer" href="https://github.com/apache/pinot"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" fill="currentColor" size="24"><g clip-path="url(#github_svg__a)"><path fill-rule="evenodd" d="M12.01 0C5.369 0 0 5.5 0 12.304c0 5.44 3.44 10.043 8.212 11.673.597.122.815-.265.815-.59 0-.286-.02-1.264-.02-2.283-3.34.734-4.036-1.466-4.036-1.466-.537-1.426-1.332-1.793-1.332-1.793-1.094-.754.08-.754.08-.754 1.212.082 1.849 1.263 1.849 1.263 1.073 1.874 2.803 1.345 3.5 1.019.098-.795.417-1.345.755-1.65-2.665-.285-5.468-1.345-5.468-6.07 0-1.345.477-2.445 1.232-3.3-.119-.306-.537-1.57.12-3.26 0 0 1.014-.326 3.3 1.263.98-.27 1.989-.407 3.003-.408 1.014 0 2.048.143 3.002.408 2.287-1.59 3.301-1.263 3.301-1.263.657 1.69.239 2.954.12 3.26.775.855 1.232 1.955 1.232 3.3 0 4.725-2.803 5.764-5.488 6.07.438.387.815 1.12.815 2.281 0 1.65-.02 2.975-.02 3.382 0 .326.22.713.816.59C20.56 22.347 24 17.744 24 12.305 24.02 5.5 18.63 0 12.01 0" clip-rule="evenodd"></path></g><defs><clipPath id="github_svg__a"><path fill="#fff" d="M0 0h24v24H0z"></path></clipPath></defs></svg></a></div></div><div class="mt-8 border-t border-neutral-300 pt-4 text-left text-sm text-gray-600">Copyright © <!-- -->2024<!-- --> The Apache Software Foundation. Apache Pinot, Pinot, Apache, the Apache feather logo, and the Apache Pinot project logo are registered trademarks of The Apache Software Foundation. This page has references to third party software - Presto, PrestoDB, ThirdEye, Trino, TrinoDB, that are not part of the Apache Software Foundation and are not covered under the Apache License.</div></footer></div><script src="/_next/static/chunks/webpack-dde39ac7c1b4eb4b.js" crossorigin="" async=""></script><script>(self.__next_f=self.__next_f||[]).push([0]);self.__next_f.push([2,null])</script><script>self.__next_f.push([1,"1:HL[\"/_next/static/media/6905431624c34d00-s.p.woff2\",\"font\",{\"crossOrigin\":\"\",\"type\":\"font/woff2\"}]\n2:HL[\"/_next/static/css/9e925a33b1acdac1.css\",\"style\",{\"crossOrigin\":\"\"}]\n0:\"$L3\"\n"])</script><script>self.__next_f.push([1,"4:HL[\"/_next/static/css/c130d1629644f070.css\",\"style\",{\"crossOrigin\":\"\"}]\n"])</script><script>self.__next_f.push([1,"5:I[3728,[],\"\"]\n7:I[9928,[],\"\"]\n8:I[7821,[\"326\",\"static/chunks/326-3a90a6443b9c824c.js\",\"980\",\"static/chunks/980-6e243f9cd384c7d2.js\",\"702\",\"static/chunks/702-a2bf9fe707814b79.js\",\"185\",\"static/chunks/app/layout-776a485845c720ef.js\"],\"ThemeProviders\"]\n9:I[3994,[\"326\",\"static/chunks/326-3a90a6443b9c824c.js\",\"980\",\"static/chunks/980-6e243f9cd384c7d2.js\",\"702\",\"static/chunks/702-a2bf9fe707814b79.js\",\"185\",\"static/chunks/app/layout-776a485845c720ef.js\"],\"\"]\na:I[9640,[\"326\",\"static/chunks/326-3a90a6443b9c824c.js"])</script><script>self.__next_f.push([1,"\",\"980\",\"static/chunks/980-6e243f9cd384c7d2.js\",\"702\",\"static/chunks/702-a2bf9fe707814b79.js\",\"185\",\"static/chunks/app/layout-776a485845c720ef.js\"],\"AlgoliaSearchProvider\"]\nb:I[7975,[\"326\",\"static/chunks/326-3a90a6443b9c824c.js\",\"980\",\"static/chunks/980-6e243f9cd384c7d2.js\",\"702\",\"static/chunks/702-a2bf9fe707814b79.js\",\"185\",\"static/chunks/app/layout-776a485845c720ef.js\"],\"\"]\nc:I[6954,[],\"\"]\nd:I[7264,[],\"\"]\ne:I[8326,[\"326\",\"static/chunks/326-3a90a6443b9c824c.js\",\"413\",\"static/chunks/413-f9f40b83f7bb3f22.js\""])</script><script>self.__next_f.push([1,",\"980\",\"static/chunks/980-6e243f9cd384c7d2.js\",\"797\",\"static/chunks/app/blog/%5B...slug%5D/page-502e08b6677b55da.js\"],\"\"]\n11:T9fe,"])</script><script>self.__next_f.push([1,"M42.99 18.448c1.032-.553 2.21-.831 3.535-.831 1.542 0 2.938.38 4.187 1.14 1.248.76 2.236 1.841 2.965 3.241.728 1.402 1.091 3.025 1.091 4.872s-.363 3.482-1.091 4.903c-.729 1.424-1.717 2.525-2.965 3.307-1.25.782-2.645 1.173-4.187 1.173-1.325 0-2.493-.271-3.503-.815-1.01-.543-1.83-1.226-2.46-2.053v14.612H36V17.912h4.562v2.606c.586-.825 1.395-1.515 2.426-2.068l.002-.002m6.452 5.605c-.445-.793-1.032-1.395-1.76-1.808a4.72 4.72 0 0 0-2.362-.618c-.847 0-1.602.211-2.33.635-.728.423-1.315 1.038-1.76 1.841-.445.804-.668 1.749-.668 2.835 0 1.087.221 2.032.668 2.835.445.804 1.032 1.417 1.76 1.842a4.557 4.557 0 0 0 2.33.635 4.57 4.57 0 0 0 2.362-.652c.728-.435 1.313-1.053 1.76-1.856.445-.804.668-1.76.668-2.867s-.223-2.025-.668-2.818v-.004M62.947 17.912v18.051h-4.562V17.912h4.562m.551-6.079a2.833 2.833 0 1 1-5.666 0 2.833 2.833 0 0 1 5.666 0M82.954 19.687c1.325 1.358 1.988 3.253 1.988 5.685v10.59H80.38v-9.97c0-1.434-.358-2.537-1.075-3.307-.717-.772-1.695-1.157-2.933-1.157-1.239 0-2.254.387-2.982 1.157-.728.772-1.091 1.873-1.091 3.307v9.97h-4.562V17.91h4.562v2.248a6.322 6.322 0 0 1 2.33-1.841c.944-.445 1.981-.669 3.111-.669 2.15 0 3.889.68 5.214 2.037v.002M92.892 35.098c-1.39-.77-2.482-1.861-3.275-3.275-.794-1.411-1.19-3.041-1.19-4.888s.406-3.475 1.221-4.888a8.502 8.502 0 0 1 3.34-3.275c1.412-.772 2.987-1.157 4.725-1.157 1.739 0 3.312.387 4.725 1.157a8.5 8.5 0 0 1 3.34 3.275c.815 1.411 1.222 3.041 1.222 4.888s-.418 3.475-1.255 4.888a8.708 8.708 0 0 1-3.388 3.275c-1.424.772-3.014 1.157-4.774 1.157-1.76 0-3.301-.385-4.691-1.157m7.021-3.421c.729-.402 1.309-1.005 1.744-1.809.435-.803.651-1.781.651-2.933 0-1.715-.451-3.035-1.351-3.958-.902-.924-2.004-1.385-3.307-1.385s-2.395.461-3.275 1.385c-.88.923-1.32 2.243-1.32 3.958 0 1.715.428 3.035 1.287 3.958.858.924 1.938 1.385 3.241 1.385.825 0 1.602-.2 2.33-.603v.002M115.96 21.658v8.734c0 .608.147 1.048.44 1.32.293.271.787.406 1.482.406H120v3.845h-2.867c-3.845 0-5.766-1.868-5.766-5.605v-8.7h-2.15v-3.746h2.15V13l4.595-1v5.912h4.04v3.746h-4.042M20.03 46.757l-5.538-1.385A1.97 1.97 0 0 1 13 43.46v-5.462c0-.841.349-1.601.907-2.146a12.212 12.212 0 0 0 6.975-3.644c2.602-2.731 3.627-6.578 2.882-10.251L21 9h-4V4a1 1 0 0 0-2 0v7a1 1 0 0 1-2 0v-1a1 1 0 0 0-2 0v6.758a4.489 4.489 0 0 1 2.694-.755c2.278.095 4.156 1.934 4.297 4.21a4.501 4.501 0 0 1-6.992 4.029V29a1 1 0 0 1-2 0V7a1 1 0 0 0-2 0v2h-4L.237 21.957c-.745 3.675.279 7.52 2.882 10.251a12.202 12.202 0 0 0 6.975 3.644c.558.545.907 1.305.907 2.146V43.4c0 .938-.639 1.757-1.55 1.985l-5.48 1.37c-.57.143-.97.655-.97 1.243h18c0-.588-.4-1.1-.97-1.243v.002"])</script><script>self.__next_f.push([1,"3:[[[\"$\",\"link\",\"0\",{\"rel\":\"stylesheet\",\"href\":\"/_next/static/css/9e925a33b1acdac1.css\",\"precedence\":\"next\",\"crossOrigin\":\"\"}]],[\"$\",\"$L5\",null,{\"buildId\":\"rmcKjFZ3e9kKdH1iJwCIQ\",\"assetPrefix\":\"\",\"initialCanonicalUrl\":\"/blog/2023/08/04/segment-compaction-for-upsert-enabled-tables-in-apache-pinot-3f30657aa077\",\"initialTree\":[\"\",{\"children\":[\"blog\",{\"children\":[[\"slug\",\"2023/08/04/segment-compaction-for-upsert-enabled-tables-in-apache-pinot-3f30657aa077\",\"c\"],{\"children\":[\"__PAGE__?{\\\"slug\\\":[\\\"2023\\\",\\\"08\\\",\\\"04\\\",\\\"segment-compaction-for-upsert-enabled-tables-in-apache-pinot-3f30657aa077\\\"]}\",{}]}]}]},\"$undefined\",\"$undefined\",true],\"initialHead\":[false,\"$L6\"],\"globalErrorComponent\":\"$7\",\"children\":[null,[\"$\",\"html\",null,{\"lang\":\"en-us\",\"className\":\"__variable_1fc36d scroll-smooth\",\"suppressHydrationWarning\":true,\"children\":[[\"$\",\"head\",null,{\"children\":[[\"$\",\"meta\",null,{\"httpEquiv\":\"Content-Security-Policy\",\"content\":\"default-src 'self';script-src 'self' 'unsafe-eval' 'unsafe-inline' giscus.app analytics.umami.is www.youtube.com www.googletagmanager.com www.google-analytics.com;style-src 'self' 'unsafe-inline';img-src * blob: data:;media-src *.s3.amazonaws.com;connect-src *;font-src 'self';frame-src www.youtube.com youtube.com giscus.app youtu.be https://www.youtube.com https://youtube.com;\"}],[\"$\",\"link\",null,{\"rel\":\"apple-touch-icon\",\"sizes\":\"76x76\",\"href\":\"/static/favicons/apple-touch-icon.png\"}],[\"$\",\"link\",null,{\"rel\":\"icon\",\"type\":\"image/png\",\"sizes\":\"32x32\",\"href\":\"/static/favicons/favicon-32x32.png\"}],[\"$\",\"link\",null,{\"rel\":\"icon\",\"type\":\"image/png\",\"sizes\":\"16x16\",\"href\":\"/static/favicons/favicon-16x16.png\"}],[\"$\",\"link\",null,{\"rel\":\"manifest\",\"href\":\"/static/favicons/site.webmanifest\"}],[\"$\",\"link\",null,{\"rel\":\"mask-icon\",\"href\":\"/static/favicons/safari-pinned-tab.svg\",\"color\":\"#5bbad5\"}],[\"$\",\"meta\",null,{\"name\":\"msapplication-TileColor\",\"content\":\"#000000\"}],[\"$\",\"meta\",null,{\"name\":\"theme-color\",\"media\":\"(prefers-color-scheme: light)\",\"content\":\"#fff\"}],[\"$\",\"meta\",null,{\"name\":\"theme-color\",\"media\":\"(prefers-color-scheme: dark)\",\"content\":\"#000\"}],[\"$\",\"link\",null,{\"rel\":\"alternate\",\"type\":\"application/rss+xml\",\"href\":\"/feed.xml\"}]]}],[\"$\",\"body\",null,{\"className\":\"bg-white text-black antialiased dark:bg-gray-950 dark:text-white\",\"children\":[\"$\",\"$L8\",null,{\"children\":[[\"$undefined\",\"$undefined\",\"$undefined\",\"$undefined\",[[\"$\",\"$L9\",null,{\"strategy\":\"afterInteractive\",\"src\":\"https://www.googletagmanager.com/gtag/js?id=G-ZXG79NJEBY\"}],[\"$\",\"$L9\",null,{\"strategy\":\"afterInteractive\",\"id\":\"ga-script\",\"children\":\"\\n            window.dataLayer = window.dataLayer || [];\\n            function gtag(){dataLayer.push(arguments);}\\n            gtag('js', new Date());\\n            gtag('config', 'G-ZXG79NJEBY');\\n        \"}]]],[\"$\",\"div\",null,{\"className\":\"mx-auto flex max-w-screen-customDesktop flex-col justify-between font-sans\",\"children\":[\"$\",\"$La\",null,{\"algoliaConfig\":{\"appId\":\"CKRA00L2X9\",\"apiKey\":\"6531f8f7783a88d76629190843f1801e\",\"indexName\":\"prod_apache_pinot_docs\"},\"children\":[[\"$\",\"$Lb\",null,{}],[\"$\",\"main\",null,{\"children\":[\"$\",\"$Lc\",null,{\"parallelRouterKey\":\"children\",\"segmentPath\":[\"children\"],\"loading\":\"$undefined\",\"loadingStyles\":\"$undefined\",\"loadingScripts\":\"$undefined\",\"hasLoading\":false,\"error\":\"$undefined\",\"errorStyles\":\"$undefined\",\"errorScripts\":\"$undefined\",\"template\":[\"$\",\"$Ld\",null,{}],\"templateStyles\":\"$undefined\",\"templateScripts\":\"$undefined\",\"notFound\":[\"$\",\"div\",null,{\"className\":\"flex flex-col items-start justify-start md:mt-24 md:flex-row md:items-center md:justify-center md:space-x-6\",\"children\":[[\"$\",\"div\",null,{\"className\":\"space-x-2 pb-8 pt-6 md:space-y-5\",\"children\":[\"$\",\"h1\",null,{\"className\":\"text-6xl font-extrabold leading-9 tracking-tight text-gray-900 dark:text-gray-100 md:border-r-2 md:px-6 md:text-8xl md:leading-14\",\"children\":\"404\"}]}],[\"$\",\"div\",null,{\"className\":\"max-w-md\",\"children\":[[\"$\",\"p\",null,{\"className\":\"mb-4 text-xl font-bold leading-normal md:text-2xl\",\"children\":\"Sorry we couldn't find this page.\"}],[\"$\",\"p\",null,{\"className\":\"mb-8\",\"children\":\"But dont worry, you can find plenty of other things on our homepage.\"}],[\"$\",\"$Le\",null,{\"href\":\"/\",\"className\":\"focus:shadow-outline-blue inline rounded-lg border border-transparent bg-blue-600 px-4 py-2 text-sm font-medium leading-5 text-white shadow transition-colors duration-150 hover:bg-blue-700 focus:outline-none dark:hover:bg-blue-500\",\"children\":\"Back to homepage\"}]]}]]}],\"notFoundStyles\":[],\"initialChildNode\":[\"$\",\"$Lc\",null,{\"parallelRouterKey\":\"children\",\"segmentPath\":[\"children\",\"blog\",\"children\"],\"loading\":\"$undefined\",\"loadingStyles\":\"$undefined\",\"loadingScripts\":\"$undefined\",\"hasLoading\":false,\"error\":\"$undefined\",\"errorStyles\":\"$undefined\",\"errorScripts\":\"$undefined\",\"template\":[\"$\",\"$Ld\",null,{}],\"templateStyles\":\"$undefined\",\"templateScripts\":\"$undefined\",\"notFound\":\"$undefined\",\"notFoundStyles\":\"$undefined\",\"initialChildNode\":[\"$\",\"$Lc\",null,{\"parallelRouterKey\":\"children\",\"segmentPath\":[\"children\",\"blog\",\"children\",[\"slug\",\"2023/08/04/segment-compaction-for-upsert-enabled-tables-in-apache-pinot-3f30657aa077\",\"c\"],\"children\"],\"loading\":\"$undefined\",\"loadingStyles\":\"$undefined\",\"loadingScripts\":\"$undefined\",\"hasLoading\":false,\"error\":\"$undefined\",\"errorStyles\":\"$undefined\",\"errorScripts\":\"$undefined\",\"template\":[\"$\",\"$Ld\",null,{}],\"templateStyles\":\"$undefined\",\"templateScripts\":\"$undefined\",\"notFound\":\"$undefined\",\"notFoundStyles\":\"$undefined\",\"initialChildNode\":[\"$Lf\",\"$L10\",null],\"childPropSegment\":\"__PAGE__?{\\\"slug\\\":[\\\"2023\\\",\\\"08\\\",\\\"04\\\",\\\"segment-compaction-for-upsert-enabled-tables-in-apache-pinot-3f30657aa077\\\"]}\",\"styles\":[[\"$\",\"link\",\"0\",{\"rel\":\"stylesheet\",\"href\":\"/_next/static/css/c130d1629644f070.css\",\"precedence\":\"next\",\"crossOrigin\":\"\"}]]}],\"childPropSegment\":[\"slug\",\"2023/08/04/segment-compaction-for-upsert-enabled-tables-in-apache-pinot-3f30657aa077\",\"c\"],\"styles\":null}],\"childPropSegment\":\"blog\",\"styles\":null}]}],[\"$\",\"footer\",null,{\"className\":\"border-t bg-sky-100 px-5 py-10 md:px-[6.75rem] md:pb-10 md:pt-16\",\"children\":[[\"$\",\"div\",null,{\"className\":\"mx-auto flex max-w-7xl flex-wrap justify-between\",\"children\":[[\"$\",\"div\",null,{\"className\":\"flex-shrink-0\",\"children\":[\"$\",\"svg\",null,{\"xmlns\":\"http://www.w3.org/2000/svg\",\"width\":120,\"height\":48,\"fill\":\"none\",\"children\":[[\"$\",\"g\",null,{\"fill\":\"#C7154A\",\"clipPath\":\"url(#logo_svg__a)\",\"children\":[[\"$\",\"path\",null,{\"d\":\"$11\"}],[\"$\",\"path\",null,{\"d\":\"M13.5 23a2.5 2.5 0 1 0 0-5 2.5 2.5 0 0 0 0 5M8 5a1 1 0 1 0 0-2 1 1 0 0 0 0 2M12 8a1 1 0 1 0 0-2 1 1 0 0 0 0 2M16 2a1 1 0 1 0 0-2 1 1 0 0 0 0 2\"}]]}],[\"$\",\"defs\",null,{\"children\":[\"$\",\"clipPath\",null,{\"id\":\"logo_svg__a\",\"children\":[\"$\",\"path\",null,{\"fill\":\"#fff\",\"d\":\"M0 0h120v48H0z\"}]}]}]]}]}],[\"$\",\"div\",null,{\"className\":\"flex flex-wrap gap-x-16 gap-y-5 py-8 md:pl-24 md:pr-[21.625rem]\",\"children\":[\" \",[[\"$\",\"div\",\"Resources\",{\"children\":[[\"$\",\"h5\",null,{\"className\":\"mb-4 text-lg font-semibold\",\"children\":\"Resources\"}],[\"$\",\"div\",null,{\"className\":\"flex justify-between gap-x-10\",\"children\":[[\"$\",\"div\",null,{\"className\":\"flex flex-col\",\"children\":[[\"$\",\"a\",null,{\"target\":\"_blank\",\"rel\":\"noopener noreferrer\",\"href\":\"https://docs.pinot.apache.org/\",\"className\":\"block py-1 text-gray-600 hover:text-gray-900\",\"children\":\"Docs\"}],[\"$\",\"a\",null,{\"target\":\"_blank\",\"rel\":\"noopener noreferrer\",\"href\":\"https://docs.pinot.apache.org/getting-started\",\"className\":\"block py-1 text-gray-600 hover:text-gray-900\",\"children\":\"Getting Started\"}],[\"$\",\"a\",null,{\"target\":\"_blank\",\"rel\":\"noopener noreferrer\",\"href\":\"https://docs.pinot.apache.org/integrations/thirdeye\",\"className\":\"block py-1 text-gray-600 hover:text-gray-900\",\"children\":\"ThirdEye\"}]]}],[\"$\",\"div\",null,{\"className\":\"flex flex-col\",\"children\":[[\"$\",\"$Le\",null,{\"href\":\"/powered-by\",\"className\":\"block py-1 text-gray-600 hover:text-gray-900\",\"children\":\"Company Stories\"}],[\"$\",\"$Le\",null,{\"href\":\"/download\",\"className\":\"block py-1 text-gray-600 hover:text-gray-900\",\"children\":\"Download\"}],[\"$\",\"$Le\",null,{\"href\":\"/blog\",\"className\":\"block py-1 text-gray-600 hover:text-gray-900\",\"children\":\"Blog\"}]]}]]}]]}],[\"$\",\"div\",\"Apache\",{\"children\":[[\"$\",\"h5\",null,{\"className\":\"mb-4 text-lg font-semibold\",\"children\":\"Apache\"}],[\"$\",\"div\",null,{\"className\":\"flex justify-between gap-x-10\",\"children\":[[\"$\",\"div\",null,{\"className\":\"flex flex-col\",\"children\":[[\"$\",\"a\",null,{\"target\":\"_blank\",\"rel\":\"noopener noreferrer\",\"href\":\"https://www.apache.org\",\"className\":\"block py-1 text-gray-600 hover:text-gray-900\",\"children\":\"Foundation\"}],[\"$\",\"a\",null,{\"target\":\"_blank\",\"rel\":\"noopener noreferrer\",\"href\":\"https://www.apache.org/licenses\",\"className\":\"block py-1 text-gray-600 hover:text-gray-900\",\"children\":\"License\"}],[\"$\",\"a\",null,{\"target\":\"_blank\",\"rel\":\"noopener noreferrer\",\"href\":\"https://www.apache.org/security\",\"className\":\"block py-1 text-gray-600 hover:text-gray-900\",\"children\":\"Security\"}]]}],[\"$\",\"div\",null,{\"className\":\"flex flex-col\",\"children\":[[\"$\",\"a\",null,{\"target\":\"_blank\",\"rel\":\"noopener noreferrer\",\"href\":\"https://www.apache.org/foundation/sponsorship.html\",\"className\":\"block py-1 text-gray-600 hover:text-gray-900\",\"children\":\"Sponsorship\"}],[\"$\",\"a\",null,{\"target\":\"_blank\",\"rel\":\"noopener noreferrer\",\"href\":\"https://www.apache.org/events/current-event\",\"className\":\"block py-1 text-gray-600 hover:text-gray-900\",\"children\":\"Events\"}],[\"$\",\"a\",null,{\"target\":\"_blank\",\"rel\":\"noopener noreferrer\",\"href\":\"https://www.apache.org/foundation/thanks.html\",\"className\":\"block py-1 text-gray-600 hover:text-gray-900\",\"children\":\"Thanks\"}]]}]]}]]}]]]}],[\"$\",\"div\",null,{\"className\":\"mt-4 flex justify-center md:mt-0\",\"children\":[[\"$\",\"a\",null,{\"target\":\"_blank\",\"rel\":\"noopener noreferrer\",\"href\":\"https://join.slack.com/t/apache-pinot/shared_invite/zt-5z7pav2f-yYtjZdVA~EDmrGkho87Vzw\",\"className\":\"mr-4\",\"children\":[\"$\",\"svg\",null,{\"xmlns\":\"http://www.w3.org/2000/svg\",\"width\":24,\"height\":24,\"viewBox\":\"0 0 24 24\",\"fill\":\"none\",\"stroke\":\"currentColor\",\"strokeWidth\":2,\"strokeLinecap\":\"round\",\"strokeLinejoin\":\"round\",\"className\":\"lucide lucide-slack fill-gray-900\",\"children\":[[\"$\",\"rect\",\"diqz80\",{\"width\":\"3\",\"height\":\"8\",\"x\":\"13\",\"y\":\"2\",\"rx\":\"1.5\"}],[\"$\",\"path\",\"183iwg\",{\"d\":\"M19 8.5V10h1.5A1.5 1.5 0 1 0 19 8.5\"}],[\"$\",\"rect\",\"hqg7r1\",{\"width\":\"3\",\"height\":\"8\",\"x\":\"8\",\"y\":\"14\",\"rx\":\"1.5\"}],[\"$\",\"path\",\"76g71w\",{\"d\":\"M5 15.5V14H3.5A1.5 1.5 0 1 0 5 15.5\"}],[\"$\",\"rect\",\"1kmz0a\",{\"width\":\"8\",\"height\":\"3\",\"x\":\"14\",\"y\":\"13\",\"rx\":\"1.5\"}],[\"$\",\"path\",\"jc4sz0\",{\"d\":\"M15.5 19H14v1.5a1.5 1.5 0 1 0 1.5-1.5\"}],[\"$\",\"rect\",\"1omvl4\",{\"width\":\"8\",\"height\":\"3\",\"x\":\"2\",\"y\":\"8\",\"rx\":\"1.5\"}],[\"$\",\"path\",\"16f3cl\",{\"d\":\"M8.5 5H10V3.5A1.5 1.5 0 1 0 8.5 5\"}],\"$undefined\"]}]}],[\"$\",\"a\",null,{\"target\":\"_blank\",\"rel\":\"noopener noreferrer\",\"href\":\"https://github.com/apache/pinot\",\"children\":[\"$\",\"svg\",null,{\"xmlns\":\"http://www.w3.org/2000/svg\",\"width\":24,\"height\":24,\"fill\":\"currentColor\",\"size\":24,\"children\":[[\"$\",\"g\",null,{\"clipPath\":\"url(#github_svg__a)\",\"children\":[\"$\",\"path\",null,{\"fillRule\":\"evenodd\",\"d\":\"M12.01 0C5.369 0 0 5.5 0 12.304c0 5.44 3.44 10.043 8.212 11.673.597.122.815-.265.815-.59 0-.286-.02-1.264-.02-2.283-3.34.734-4.036-1.466-4.036-1.466-.537-1.426-1.332-1.793-1.332-1.793-1.094-.754.08-.754.08-.754 1.212.082 1.849 1.263 1.849 1.263 1.073 1.874 2.803 1.345 3.5 1.019.098-.795.417-1.345.755-1.65-2.665-.285-5.468-1.345-5.468-6.07 0-1.345.477-2.445 1.232-3.3-.119-.306-.537-1.57.12-3.26 0 0 1.014-.326 3.3 1.263.98-.27 1.989-.407 3.003-.408 1.014 0 2.048.143 3.002.408 2.287-1.59 3.301-1.263 3.301-1.263.657 1.69.239 2.954.12 3.26.775.855 1.232 1.955 1.232 3.3 0 4.725-2.803 5.764-5.488 6.07.438.387.815 1.12.815 2.281 0 1.65-.02 2.975-.02 3.382 0 .326.22.713.816.59C20.56 22.347 24 17.744 24 12.305 24.02 5.5 18.63 0 12.01 0\",\"clipRule\":\"evenodd\"}]}],[\"$\",\"defs\",null,{\"children\":[\"$\",\"clipPath\",null,{\"id\":\"github_svg__a\",\"children\":[\"$\",\"path\",null,{\"fill\":\"#fff\",\"d\":\"M0 0h24v24H0z\"}]}]}]]}]}]]}]]}],[\"$\",\"div\",null,{\"className\":\"mt-8 border-t border-neutral-300 pt-4 text-left text-sm text-gray-600\",\"children\":[\"Copyright © \",2024,\" The Apache Software Foundation. Apache Pinot, Pinot, Apache, the Apache feather logo, and the Apache Pinot project logo are registered trademarks of The Apache Software Foundation. This page has references to third party software - Presto, PrestoDB, ThirdEye, Trino, TrinoDB, that are not part of the Apache Software Foundation and are not covered under the Apache License.\"]}]]}]]}]}]]}]}]]}],null]}]]\n"])</script><script>self.__next_f.push([1,"12:I[1514,[\"326\",\"static/chunks/326-3a90a6443b9c824c.js\",\"413\",\"static/chunks/413-f9f40b83f7bb3f22.js\",\"980\",\"static/chunks/980-6e243f9cd384c7d2.js\",\"797\",\"static/chunks/app/blog/%5B...slug%5D/page-502e08b6677b55da.js\"],\"\"]\n13:I[2529,[\"326\",\"static/chunks/326-3a90a6443b9c824c.js\",\"413\",\"static/chunks/413-f9f40b83f7bb3f22.js\",\"980\",\"static/chunks/980-6e243f9cd384c7d2.js\",\"797\",\"static/chunks/app/blog/%5B...slug%5D/page-502e08b6677b55da.js\"],\"\"]\n14:I[5185,[\"326\",\"static/chunks/326-3a90a6443b9c824c.js\",\"413\",\""])</script><script>self.__next_f.push([1,"static/chunks/413-f9f40b83f7bb3f22.js\",\"980\",\"static/chunks/980-6e243f9cd384c7d2.js\",\"797\",\"static/chunks/app/blog/%5B...slug%5D/page-502e08b6677b55da.js\"],\"\"]\n"])</script><script>self.__next_f.push([1,"10:[[\"$\",\"script\",null,{\"type\":\"application/ld+json\",\"dangerouslySetInnerHTML\":{\"__html\":\"{\\\"@context\\\":\\\"https://schema.org\\\",\\\"@type\\\":\\\"BlogPosting\\\",\\\"headline\\\":\\\"Segment Compaction for Upsert Enabled Tables in Apache Pinot\\\",\\\"datePublished\\\":\\\"2023-08-04T00:00:00.000Z\\\",\\\"dateModified\\\":\\\"2023-08-04T00:00:00.000Z\\\",\\\"description\\\":\\\"The blog post discusses the feature contribution of segment compaction to Apache Pinot project, addressing the issue of older records consuming unnecessary storage space. It explains the configuration and impact of segment compaction in freeing up storage. The author expresses gratitude and offers support for questions or feedback on segment compaction.\\\",\\\"image\\\":\\\"/static/images/twitter-card.png\\\",\\\"url\\\":\\\"https://pinot.apache.org/blog/2023-08-04-segment-compaction-for-upsert-enabled-tables-in-apache-pinot-3f30657aa077\\\",\\\"author\\\":[{\\\"@type\\\":\\\"Person\\\",\\\"name\\\":\\\"Robert Zych\\\"}]}\"}}],[\"$\",\"section\",null,{\"className\":\" px-5 pt-10 md:px-[13.313rem] md:py-16\",\"children\":[[\"$\",\"$L12\",null,{}],[\"$\",\"article\",null,{\"className\":\"\",\"children\":[\"$\",\"div\",null,{\"className\":\"mx-auto lg:flex\",\"children\":[[\"$\",\"div\",null,{\"className\":\"lg:pr-12\",\"children\":[[\"$\",\"header\",null,{\"className\":\"pt-6 md:pr-10\",\"children\":[[\"$\",\"h1\",null,{\"className\":\"text-4xl font-semibold\",\"children\":\"Segment Compaction for Upsert Enabled Tables in Apache Pinot\"}],[\"$\",\"p\",null,{\"className\":\"pt-2 text-lg\",\"children\":[\"By: \",\"Robert Zych\"]}],[\"$\",\"p\",null,{\"className\":\"py-2 text-sm\",\"children\":[\"August 4th, 2023\",\" • \",\"4 min read\"]}]]}],[\"$\",\"div\",null,{\"className\":\"flex flex-col lg:flex-row\",\"children\":[\"$\",\"main\",null,{\"className\":\"\",\"children\":[\"$\",\"div\",null,{\"className\":\"prose max-w-[45rem] pb-8 pt-10 dark:prose-invert\",\"children\":[[\"$\",\"p\",null,{\"children\":[\"I’m happy to share that my 1st feature contribution to the Apache Pinot project (\",[\"$\",\"a\",null,{\"target\":\"_blank\",\"rel\":\"noopener noreferrer\",\"href\":\"https://github.com/apache/pinot/pull/10463\",\"children\":\"Segment compaction for upsert enabled real-time tables\"}],\") was merged recently! In this post, I will briefly discuss the problem segment compaction addresses, how to configure it, and what it looks like in action. If you’re unfamiliar with Pinot’s Upsert features, I recommend reviewing \",[\"$\",\"a\",null,{\"target\":\"_blank\",\"rel\":\"noopener noreferrer\",\"href\":\"https://dev.startree.ai/docs/pinot/recipes/upserts-full\",\"children\":\"Full Upserts in Pinot\"}],\" to get started and \",[\"$\",\"a\",null,{\"target\":\"_blank\",\"rel\":\"noopener noreferrer\",\"href\":\"https://docs.pinot.apache.org/basics/data-import/upsert\",\"children\":\"Stream Ingestion with Upsert\"}],\" for more information.\"]}],[\"$\",\"h2\",null,{\"id\":\"context-and-configuration\",\"children\":[[\"$\",\"a\",null,{\"href\":\"#context-and-configuration\",\"aria-hidden\":\"true\",\"tabIndex\":\"-1\",\"children\":[\"$\",\"span\",null,{\"className\":\"icon icon-link\"}]}],\"Context and Configuration\"]}],[\"$\",\"p\",null,{\"children\":\"As Pinot’s Upsert stores all versions of the record ingested into immutable segments on disk, older records unnecessarily consume valuable storage space when they’re no longer used in query results. Pinot’s Segment Compaction reclaims this valuable storage space by introducing a periodic process that replaces the completed segments with compacted segments which only contain the latest version of the records. I recommend reviewing the Minion documentation if you’re unfamiliar with Pinot’s ability to run periodic processes.\"}],[\"$\",\"p\",null,{\"children\":\"With task scheduling enabled and an available Minion, you can configure segment compaction by adding the following to your table’s config.\"}],[\"$\",\"$L13\",null,{\"className\":\"language-json\",\"children\":[\"$\",\"code\",null,{\"className\":\"language-json code-highlight\",\"children\":[[\"$\",\"span\",null,{\"className\":\"code-line\",\"children\":[[\"$\",\"span\",null,{\"className\":\"token property\",\"children\":\"\\\"task\\\"\"}],[\"$\",\"span\",null,{\"className\":\"token operator\",\"children\":\":\"}],\" \",[\"$\",\"span\",null,{\"className\":\"token punctuation\",\"children\":\"{\"}],\"\\n\"]}],[\"$\",\"span\",null,{\"className\":\"code-line\",\"children\":[\"  \",[\"$\",\"span\",null,{\"className\":\"token property\",\"children\":\"\\\"taskTypeConfigsMap\\\"\"}],[\"$\",\"span\",null,{\"className\":\"token operator\",\"children\":\":\"}],\" \",[\"$\",\"span\",null,{\"className\":\"token punctuation\",\"children\":\"{\"}],\"\\n\"]}],[\"$\",\"span\",null,{\"className\":\"code-line\",\"children\":[\"    \",[\"$\",\"span\",null,{\"className\":\"token property\",\"children\":\"\\\"UpsertCompactionTask\\\"\"}],[\"$\",\"span\",null,{\"className\":\"token operator\",\"children\":\":\"}],\" \",[\"$\",\"span\",null,{\"className\":\"token punctuation\",\"children\":\"{\"}],\"\\n\"]}],[\"$\",\"span\",null,{\"className\":\"code-line\",\"children\":[\"      \",[\"$\",\"span\",null,{\"className\":\"token property\",\"children\":\"\\\"schedule\\\"\"}],[\"$\",\"span\",null,{\"className\":\"token operator\",\"children\":\":\"}],\" \",[\"$\",\"span\",null,{\"className\":\"token string\",\"children\":\"\\\"0 */5 * ? * *\\\"\"}],[\"$\",\"span\",null,{\"className\":\"token punctuation\",\"children\":\",\"}],\"\\n\"]}],[\"$\",\"span\",null,{\"className\":\"code-line\",\"children\":[\"      \",[\"$\",\"span\",null,{\"className\":\"token property\",\"children\":\"\\\"bufferTimePeriod\\\"\"}],[\"$\",\"span\",null,{\"className\":\"token operator\",\"children\":\":\"}],\" \",[\"$\",\"span\",null,{\"className\":\"token string\",\"children\":\"\\\"7d\\\"\"}],[\"$\",\"span\",null,{\"className\":\"token punctuation\",\"children\":\",\"}],\"\\n\"]}],[\"$\",\"span\",null,{\"className\":\"code-line\",\"children\":[\"      \",[\"$\",\"span\",null,{\"className\":\"token property\",\"children\":\"\\\"invalidRecordsThresholdPercent\\\"\"}],[\"$\",\"span\",null,{\"className\":\"token operator\",\"children\":\":\"}],\" \",[\"$\",\"span\",null,{\"className\":\"token string\",\"children\":\"\\\"30\\\"\"}],[\"$\",\"span\",null,{\"className\":\"token punctuation\",\"children\":\",\"}],\"\\n\"]}],[\"$\",\"span\",null,{\"className\":\"code-line\",\"children\":[\"      \",[\"$\",\"span\",null,{\"className\":\"token property\",\"children\":\"\\\"invalidRecordsThresholdCount\\\"\"}],[\"$\",\"span\",null,{\"className\":\"token operator\",\"children\":\":\"}],\" \",[\"$\",\"span\",null,{\"className\":\"token string\",\"children\":\"\\\"100000\\\"\"}],\"\\n\"]}],[\"$\",\"span\",null,{\"className\":\"code-line\",\"children\":[\"    \",[\"$\",\"span\",null,{\"className\":\"token punctuation\",\"children\":\"}\"}],\"\\n\"]}],[\"$\",\"span\",null,{\"className\":\"code-line\",\"children\":[\"  \",[\"$\",\"span\",null,{\"className\":\"token punctuation\",\"children\":\"}\"}],\"\\n\"]}],[\"$\",\"span\",null,{\"className\":\"code-line\",\"children\":[[\"$\",\"span\",null,{\"className\":\"token punctuation\",\"children\":\"}\"}],\"\\n\"]}]]}]}],[\"$\",\"p\",null,{\"children\":\"All the configs above (excluding schedule) determine which completed segments are selected for compaction.\"}],[\"$\",\"p\",null,{\"children\":\"bufferTimePeriod is the amount of time that has elapsed since the segment was consuming. In the example above, this has been set to “7d” which means that any segment that was completed over 7 days ago may be eligible for compaction. However, if you want to ensure that segments are compacted without any additional delay this config can be set to “0d”.\"}],[\"$\",\"p\",null,{\"children\":\"invalidRecordsThresholdPercent is a limit to the amount of older records allowed in the completed segment represented as a percentage of the total number of records in the segment (i.e. old records / total records). In the example above, this has been set to “30” which means that if more than 30% of the records in the completed segment are old, then the segment may be selected for compaction. As segment compaction is an expensive operation, it is not recommended to set this config (or invalidRecordsThresholdCount) too close to 1. This config is optional on the condition that invalidRecordsThresholdCount has been set and can be used in conjunction with invalidRecordsThresholdCount.\"}],[\"$\",\"p\",null,{\"children\":\"invalidRecordsThresholdCount is also a limit similar to invalidRecordsThresholdPercent, but allows you to express the threshold as a record count. In the example above, this has been set to “100000” which means that if the segment contains more than 100K records then it may be selected for compaction.\"}],[\"$\",\"h2\",null,{\"id\":\"example-use-case\",\"children\":[[\"$\",\"a\",null,{\"href\":\"#example-use-case\",\"aria-hidden\":\"true\",\"tabIndex\":\"-1\",\"children\":[\"$\",\"span\",null,{\"className\":\"icon icon-link\"}]}],\"Example Use Case\"]}],[\"$\",\"p\",null,{\"children\":\"I’ve created a data set that includes 24M records. The data set contains 240K unique keys that have each been duplicated 100 times.\"}],[\"$\",\"p\",null,{\"children\":[\"$\",\"img\",null,{\"alt\":\"alt\",\"src\":\"https://miro.medium.com/v2/resize:fit:4800/0*gEyUq_Tp4Fycgp2r\"}]}],[\"$\",\"p\",null,{\"children\":\"After ingesting the data set there are 6 segments (5 completed segments + 1 consuming segment) with a total estimated size of 22.8MB. Submitting the query “set skipUpsert=true; select count(*) from transcript_upsert” before compaction produces the following query result.\"}],[\"$\",\"p\",null,{\"children\":[\"$\",\"img\",null,{\"alt\":\"alt\",\"src\":\"https://miro.medium.com/v2/resize:fit:1064/0*GE3P1fqAcsr0Xs5A\"}]}],[\"$\",\"p\",null,{\"children\":\"After the compaction tasks are complete, the Minion Task Manager UI reports the following.\"}],[\"$\",\"p\",null,{\"children\":[\"$\",\"img\",null,{\"alt\":\"alt\",\"src\":\"https://miro.medium.com/v2/resize:fit:2000/0*SMxDZNndFwpoeNMI\"}]}],[\"$\",\"p\",null,{\"children\":\"Segment compaction generates a task for each segment to be compacted. 5 tasks were generated in this case because 90% of the records (3.6–4.5M records) are old in all 5 of the completed segments, therefore exceeding the configured thresholds. If a completed segment only contains old records, it is deleted immediately and a task isn’t generated to compact it.\"}],[\"$\",\"p\",null,{\"children\":[\"$\",\"img\",null,{\"alt\":\"alt\",\"src\":\"https://miro.medium.com/v2/resize:fit:1068/0*LB1itt-wCohpz42i\"}]}],[\"$\",\"p\",null,{\"children\":\"Submitting the query again we now see that count matches the set of 240K unique keys.\"}],[\"$\",\"p\",null,{\"children\":[\"$\",\"img\",null,{\"alt\":\"alt\",\"src\":\"https://miro.medium.com/v2/resize:fit:2000/0*cmx4zxoMsD4-tR_u\"}]}],[\"$\",\"p\",null,{\"children\":\"Once compaction has completed and the original segments have been replaced with their compacted counterparts we see that the total number of segments remained the same, but the total estimated size dropped to only 2.77MB! Since compaction can yield very small segments, one improvement would be to merge smaller segments into larger ones as this would improve query latency.\"}],[\"$\",\"h2\",null,{\"id\":\"conclusion\",\"children\":[[\"$\",\"a\",null,{\"href\":\"#conclusion\",\"aria-hidden\":\"true\",\"tabIndex\":\"-1\",\"children\":[\"$\",\"span\",null,{\"className\":\"icon icon-link\"}]}],\"Conclusion\"]}],[\"$\",\"p\",null,{\"children\":\"In this brief overview of Segment Compaction I covered the problem it addresses, how you can configure it, and demonstrated its ability to reclaim storage space. I’d like to thank Ankit Sultana, Seunghyun Lee, and especially Jackie Jiang for their feedback and support throughout the design and development stages. If you have any questions or feedback, I’m available on the Apache Pinot Slack.\"}]]}]}]}]]}],[\"$\",\"aside\",null,{\"className\":\"mt-10 hidden border-l-2 pl-5 lg:sticky lg:top-1 lg:block lg:h-full\",\"children\":[\"$\",\"section\",null,{\"className\":\"sticky top-0 mb-4 w-[15.375rem]\",\"children\":[[\"$\",\"div\",null,{\"className\":\"flex flex-col space-y-1.5 pb-3\",\"children\":[\"$\",\"h3\",null,{\"className\":\"text-sm font-semibold leading-snug text-neutral-500 dark:text-neutral-100\",\"children\":\"Table of Contents\"}]}],[\"$\",\"$L14\",null,{\"chapters\":[{\"value\":\"Context and Configuration\",\"url\":\"#context-and-configuration\",\"depth\":2},{\"value\":\"Example Use Case\",\"url\":\"#example-use-case\",\"depth\":2},{\"value\":\"Conclusion\",\"url\":\"#conclusion\",\"depth\":2}]}]]}]}]]}]}]]}]]\n"])</script><script>self.__next_f.push([1,"6:[[\"$\",\"meta\",\"0\",{\"name\":\"viewport\",\"content\":\"width=device-width, initial-scale=1\"}],[\"$\",\"meta\",\"1\",{\"charSet\":\"utf-8\"}],[\"$\",\"title\",\"2\",{\"children\":\"Segment Compaction for Upsert Enabled Tables in Apache Pinot | Apache Pinot™\"}],[\"$\",\"meta\",\"3\",{\"name\":\"description\",\"content\":\"The blog post discusses the feature contribution of segment compaction to Apache Pinot project, addressing the issue of older records consuming unnecessary storage space. It explains the configuration and impact of segment compaction in freeing up storage. The author expresses gratitude and offers support for questions or feedback on segment compaction.\"}],[\"$\",\"meta\",\"4\",{\"name\":\"robots\",\"content\":\"index, follow\"}],[\"$\",\"meta\",\"5\",{\"name\":\"googlebot\",\"content\":\"index, follow, max-video-preview:-1, max-image-preview:large, max-snippet:-1\"}],[\"$\",\"link\",\"6\",{\"rel\":\"canonical\",\"href\":\"https://pinot.apache.org/blog/2023/08/04/segment-compaction-for-upsert-enabled-tables-in-apache-pinot-3f30657aa077\"}],[\"$\",\"link\",\"7\",{\"rel\":\"alternate\",\"type\":\"application/rss+xml\",\"href\":\"https://pinot.apache.org/feed.xml\"}],[\"$\",\"meta\",\"8\",{\"property\":\"og:title\",\"content\":\"Segment Compaction for Upsert Enabled Tables in Apache Pinot\"}],[\"$\",\"meta\",\"9\",{\"property\":\"og:description\",\"content\":\"The blog post discusses the feature contribution of segment compaction to Apache Pinot project, addressing the issue of older records consuming unnecessary storage space. It explains the configuration and impact of segment compaction in freeing up storage. The author expresses gratitude and offers support for questions or feedback on segment compaction.\"}],[\"$\",\"meta\",\"10\",{\"property\":\"og:url\",\"content\":\"https://pinot.apache.org/blog/2023/08/04/segment-compaction-for-upsert-enabled-tables-in-apache-pinot-3f30657aa077\"}],[\"$\",\"meta\",\"11\",{\"property\":\"og:site_name\",\"content\":\"Apache Pinot™\"}],[\"$\",\"meta\",\"12\",{\"property\":\"og:locale\",\"content\":\"en_US\"}],[\"$\",\"meta\",\"13\",{\"property\":\"og:image\",\"content\":\"https://pinot.apache.org/static/images/twitter-card.png\"}],[\"$\",\"meta\",\"14\",{\"property\":\"og:type\",\"content\":\"article\"}],[\"$\",\"meta\",\"15\",{\"property\":\"article:published_time\",\"content\":\"2023-08-04T00:00:00.000Z\"}],[\"$\",\"meta\",\"16\",{\"property\":\"article:modified_time\",\"content\":\"2023-08-04T00:00:00.000Z\"}],[\"$\",\"meta\",\"17\",{\"property\":\"article:author\",\"content\":\"Robert Zych\"}],[\"$\",\"meta\",\"18\",{\"name\":\"twitter:card\",\"content\":\"summary_large_image\"}],[\"$\",\"meta\",\"19\",{\"name\":\"twitter:title\",\"content\":\"Segment Compaction for Upsert Enabled Tables in Apache Pinot\"}],[\"$\",\"meta\",\"20\",{\"name\":\"twitter:description\",\"content\":\"The blog post discusses the feature contribution of segment compaction to Apache Pinot project, addressing the issue of older records consuming unnecessary storage space. It explains the configuration and impact of segment compaction in freeing up storage. The author expresses gratitude and offers support for questions or feedback on segment compaction.\"}],[\"$\",\"meta\",\"21\",{\"name\":\"twitter:image\",\"content\":\"https://pinot.apache.org/static/images/twitter-card.png\"}],[\"$\",\"meta\",\"22\",{\"name\":\"next-size-adjust\"}]]\n"])</script><script>self.__next_f.push([1,"f:null\n"])</script><script>self.__next_f.push([1,""])</script></body></html>