blob: cd5941e034d6f704de1103bfdb00f0a896a43f9a [file] [log] [blame]
<!DOCTYPE html>
<html lang="en" dir=ZgotmplZ>
<link rel="stylesheet" href="/bootstrap/css/bootstrap.min.css">
<script src="/bootstrap/js/bootstrap.bundle.min.js"></script>
<link rel="stylesheet" type="text/css" href="/font-awesome/css/font-awesome.min.css">
<script src="/js/anchor.min.js"></script>
<script src="/js/flink.js"></script>
<link rel="canonical" href="">
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="Introduction # When scheduling large-scale jobs in Flink 1.12, a lot of time is required to initialize jobs and deploy tasks. The scheduler also requires a large amount of heap memory in order to store the execution topology and host temporary deployment descriptors. For example, for a job with a topology that contains two vertices connected with an all-to-all edge and a parallelism of 10k (which means there are 10k source tasks and 10k sink tasks and every source task is connected to all sink tasks), Flink’s JobManager would require 30 GiB of heap memory and more than 4 minutes to deploy all of the tasks.">
<meta name="theme-color" content="#FFFFFF"><meta property="og:title" content="How We Improved Scheduler Performance for Large-scale Jobs - Part One" />
<meta property="og:description" content="Introduction # When scheduling large-scale jobs in Flink 1.12, a lot of time is required to initialize jobs and deploy tasks. The scheduler also requires a large amount of heap memory in order to store the execution topology and host temporary deployment descriptors. For example, for a job with a topology that contains two vertices connected with an all-to-all edge and a parallelism of 10k (which means there are 10k source tasks and 10k sink tasks and every source task is connected to all sink tasks), Flink’s JobManager would require 30 GiB of heap memory and more than 4 minutes to deploy all of the tasks." />
<meta property="og:type" content="article" />
<meta property="og:url" content="" /><meta property="article:section" content="posts" />
<meta property="article:published_time" content="2022-01-04T08:00:00+00:00" />
<meta property="article:modified_time" content="2022-01-04T08:00:00+00:00" />
<title>How We Improved Scheduler Performance for Large-scale Jobs - Part One | Apache Flink</title>
<link rel="manifest" href="/manifest.json">
<link rel="icon" href="/favicon.png" type="image/x-icon">
<link rel="stylesheet" href="/book.min.22eceb4d17baa9cdc0f57345edd6f215a40474022dfee39b63befb5fb3c596b5.css" integrity="sha256-IuzrTRe6qc3A9XNF7dbyFaQEdAIt/uObY777X7PFlrU=">
<script defer src="/" integrity="sha256-uvY1qw4Sf4AVLdHaS1JKXepny5zA/rIXELUYitqcFcE="></script>
Made with Book Theme
<meta name="generator" content="Hugo 0.124.1">
var _paq = window._paq = window._paq || [];
_paq.push(["setDomains", ["*","*"]]);
(function() {
var u="//";
_paq.push(['setTrackerUrl', u+'matomo.php']);
_paq.push(['setSiteId', '1']);
var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0];
g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s);
<body dir=ZgotmplZ>
<nav class="navbar navbar-expand-xl">
<div class="container-fluid">
<a class="navbar-brand" href="/">
<img src="/img/logo/png/100/flink_squirrel_100_color.png" alt="Apache Flink" height="47" width="47" class="d-inline-block align-text-middle">
<span>Apache Flink</span>
<button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarSupportedContent" aria-controls="navbarSupportedContent" aria-expanded="false" aria-label="Toggle navigation">
<i class="fa fa-bars navbar-toggler-icon"></i>
<div class="collapse navbar-collapse" id="navbarSupportedContent">
<ul class="navbar-nav">
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" role="button" data-bs-toggle="dropdown" aria-expanded="false">About</a>
<ul class="dropdown-menu">
<a class="dropdown-item" href="/what-is-flink/flink-architecture/">Architecture</a>
<a class="dropdown-item" href="/what-is-flink/flink-applications/">Applications</a>
<a class="dropdown-item" href="/what-is-flink/flink-operations/">Operations</a>
<a class="dropdown-item" href="/what-is-flink/use-cases/">Use Cases</a>
<a class="dropdown-item" href="/what-is-flink/powered-by/">Powered By</a>
<a class="dropdown-item" href="/what-is-flink/roadmap/">Roadmap</a>
<a class="dropdown-item" href="/what-is-flink/community/">Community & Project Info</a>
<a class="dropdown-item" href="/what-is-flink/security/">Security</a>
<a class="dropdown-item" href="/what-is-flink/special-thanks/">Special Thanks</a>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" role="button" data-bs-toggle="dropdown" aria-expanded="false">Getting Started</a>
<ul class="dropdown-menu">
<a class="dropdown-item" href="">With Flink<i class="link fa fa-external-link title" aria-hidden="true"></i>
<a class="dropdown-item" href="">With Flink Kubernetes Operator<i class="link fa fa-external-link title" aria-hidden="true"></i>
<a class="dropdown-item" href="">With Flink CDC<i class="link fa fa-external-link title" aria-hidden="true"></i>
<a class="dropdown-item" href="">With Flink ML<i class="link fa fa-external-link title" aria-hidden="true"></i>
<a class="dropdown-item" href="">With Flink Stateful Functions<i class="link fa fa-external-link title" aria-hidden="true"></i>
<a class="dropdown-item" href="">Training Course<i class="link fa fa-external-link title" aria-hidden="true"></i>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" role="button" data-bs-toggle="dropdown" aria-expanded="false">Documentation</a>
<ul class="dropdown-menu">
<a class="dropdown-item" href="">Flink 1.19 (stable)<i class="link fa fa-external-link title" aria-hidden="true"></i>
<a class="dropdown-item" href="">Flink Master (snapshot)<i class="link fa fa-external-link title" aria-hidden="true"></i>
<a class="dropdown-item" href="">Kubernetes Operator 1.8 (latest)<i class="link fa fa-external-link title" aria-hidden="true"></i>
<a class="dropdown-item" href="">Kubernetes Operator Main (snapshot)<i class="link fa fa-external-link title" aria-hidden="true"></i>
<a class="dropdown-item" href="">CDC 3.0 (stable)<i class="link fa fa-external-link title" aria-hidden="true"></i>
<a class="dropdown-item" href="">CDC Master (snapshot)<i class="link fa fa-external-link title" aria-hidden="true"></i>
<a class="dropdown-item" href="">ML 2.3 (stable)<i class="link fa fa-external-link title" aria-hidden="true"></i>
<a class="dropdown-item" href="">ML Master (snapshot)<i class="link fa fa-external-link title" aria-hidden="true"></i>
<a class="dropdown-item" href="">Stateful Functions 3.3 (stable)<i class="link fa fa-external-link title" aria-hidden="true"></i>
<a class="dropdown-item" href="">Stateful Functions Master (snapshot)<i class="link fa fa-external-link title" aria-hidden="true"></i>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" role="button" data-bs-toggle="dropdown" aria-expanded="false">How to Contribute</a>
<ul class="dropdown-menu">
<a class="dropdown-item" href="/how-to-contribute/overview/">Overview</a>
<a class="dropdown-item" href="/how-to-contribute/contribute-code/">Contribute Code</a>
<a class="dropdown-item" href="/how-to-contribute/reviewing-prs/">Review Pull Requests</a>
<a class="dropdown-item" href="/how-to-contribute/code-style-and-quality-preamble/">Code Style and Quality Guide</a>
<a class="dropdown-item" href="/how-to-contribute/contribute-documentation/">Contribute Documentation</a>
<a class="dropdown-item" href="/how-to-contribute/documentation-style-guide/">Documentation Style Guide</a>
<a class="dropdown-item" href="/how-to-contribute/improve-website/">Contribute to the Website</a>
<a class="dropdown-item" href="/how-to-contribute/getting-help/">Getting Help</a>
<li class="nav-item">
<a class="nav-link" href="/posts/">Flink Blog</a>
<li class="nav-item">
<a class="nav-link" href="/downloads/">Downloads</a>
<div class="book-search">
<div class="book-search-spinner hidden">
<i class="fa fa-refresh fa-spin"></i>
<form class="search-bar d-flex" onsubmit="return false;"su>
<input type="text" id="book-search-input" placeholder="Search" aria-label="Search" maxlength="64" data-hotkeys="s/">
<i class="fa fa-search search"></i>
<i class="fa fa-circle-o-notch fa-spin spinner"></i>
<div class="book-search-spinner hidden"></div>
<ul id="book-search-results"></ul>
<div class="navbar-clearfix"></div>
<main class="flex">
<section class="container book-page">
<article class="markdown">
<a href="/2022/01/04/how-we-improved-scheduler-performance-for-large-scale-jobs-part-one/">How We Improved Scheduler Performance for Large-scale Jobs - Part One</a>
January 4, 2022 -
Zhilong Hong
Zhu Zhu
Daisy Tsang
Till Rohrmann
<a href="">(@stsffap)</a>
<p><h1 id="introduction">
<a class="anchor" href="#introduction">#</a>
<p>When scheduling large-scale jobs in Flink 1.12, a lot of time is required to initialize jobs and deploy tasks. The scheduler also requires a large amount of heap memory in order to store the execution topology and host temporary deployment descriptors. For example, for a job with a topology that contains two vertices connected with an all-to-all edge and a parallelism of 10k (which means there are 10k source tasks and 10k sink tasks and every source task is connected to all sink tasks), Flink’s JobManager would require 30 GiB of heap memory and more than 4 minutes to deploy all of the tasks.</p>
<p>Furthermore, task deployment may block the JobManager&rsquo;s main thread for a long time and the JobManager will not be able to respond to any other requests from TaskManagers. This could lead to heartbeat timeouts that trigger a failover. In the worst case, this will render the Flink cluster unusable because it cannot deploy the job.</p>
<p>To improve the performance of the scheduler for large-scale jobs, we&rsquo;ve implemented several optimizations in Flink 1.13 and 1.14:</p>
<li>Introduce the concept of consuming groups to optimize procedures related to the complexity of topologies, including the initialization, scheduling, failover, and partition release. This also reduces the memory required to store the topology;</li>
<li>Introduce a cache to optimize task deployment, which makes the process faster and requires less memory;</li>
<li>Leverage characteristics of the logical topology and the scheduling topology to speed up the building of pipelined regions.</li>
<h1 id="benchmarking-results">
Benchmarking Results
<a class="anchor" href="#benchmarking-results">#</a>
<p>To estimate the effect of our optimizations, we conducted several experiments to compare the performance of Flink 1.12 (before the optimization) with Flink 1.14 (after the optimization). The job in our experiments contains two vertices connected with an all-to-all edge. The parallelisms of these vertices are both 10K. To make temporary deployment descriptors distributed via the blob server, we set the configuration <a href="//">blob.offload.minsize</a> to 100 KiB (from default value 1 MiB). This configuration means that the blobs larger than the set value will be distributed via the blob server, and the size of deployment descriptors in our test job is about 270 KiB. The results of our experiments are illustrated below:</p>
Table 1 - The comparison of time cost between Flink 1.12 and 1.14
<table width="95%" border="1">
<th style="text-align: center">Procedure</th>
<th style="text-align: center">1.12</th>
<th style="text-align: center">1.14</th>
<th style="text-align: center">Reduction(%)</th>
<td style="text-align: center">Job Initialization</td>
<td style="text-align: center">11,431ms</td>
<td style="text-align: center">627ms</td>
<td style="text-align: center">94.51%</td>
<td style="text-align: center">Task Deployment</td>
<td style="text-align: center">63,118ms</td>
<td style="text-align: center">17,183ms</td>
<td style="text-align: center">72.78%</td>
<td style="text-align: center">Computing tasks to restart when failover</td>
<td style="text-align: center">37,195ms</td>
<td style="text-align: center">170ms</td>
<td style="text-align: center">99.55%</td>
In addition to quicker speeds, the memory usage is significantly reduced. It requires 30 GiB heap memory for a JobManager to deploy the test job and keep it running stably with Flink 1.12, while the minimum heap memory required by the JobManager with Flink 1.14 is only 2 GiB.
<p>There are also less occurrences of long-term garbage collection. When running the test job with Flink 1.12, a garbage collection that lasts more than 10 seconds occurs during both job initialization and task deployment. With Flink 1.14, since there is no long-term garbage collection, there is also a decreased risk of heartbeat timeouts, which creates better cluster stability.</p>
<p>In our experiment, it took more than 4 minutes for the large-scale job with Flink 1.12 to transition to running (excluding the time spent on allocating resources). With Flink 1.14, it took no more than 30 seconds (excluding the time spent on allocating resources). The time cost is reduced by 87%. Thus, for users who are running large-scale jobs for production and want better scheduling performance, please consider upgrading Flink to 1.14.</p>
<p>In <a href="/2022/01/04/scheduler-performance-part-two">part two</a> of this blog post, we are going to talk about these improvements in detail.</p>
<div class="edit-this-page">
<a href="">Want to contribute translation?</a>
<a href="//">
Edit This Page<i class="fa fa-edit fa-fw"></i>
<aside class="book-toc">
<nav id="TableOfContents"><h3>On This Page <a href="javascript:void(0)" class="toc" onclick="collapseToc()"><i class="fa fa-times" aria-hidden="true"></i></a></h3>
<li><a href="#introduction">Introduction</a></li>
<li><a href="#benchmarking-results">Benchmarking Results</a></li>
<aside class="expand-toc hidden">
<a class="toc" onclick="expandToc()" href="javascript:void(0)">
<i class="fa fa-bars" aria-hidden="true"></i>
<div class="separator"></div>
<div class="panels">
<div class="wrapper">
<div class="panel">
<a href=""></a>
<a href="">Apache Software Foundation</a>
<a href="">License</a>
<a href="/zh/">
<i class="fa fa-globe" aria-hidden="true"></i>&nbsp;中文版
<div class="panel">
<a href="">Security</a>
<a href="">Donate</a>
<a href="">Thanks</a>
<div class="panel icons">
<a href="/posts">
<div class="icon flink-blog-icon"></div>
<span>Flink blog</span>
<a href="">
<div class="icon flink-github-icon"></div>
<a href="">
<div class="icon flink-twitter-icon"></div>
<div class="container disclaimer">
<p>The contents of this website are © 2024 Apache Software Foundation under the terms of the Apache License v2. Apache Flink, Flink, and the Flink logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.</p>