src/doc/3.11.6/operating/hardware.html - cassandra-website - Git at Google

 ---
 layout: docpage

 title: "Documentation"

 is_homepage: false
 is_sphinx_doc: true

 doc-parent: "Operating Cassandra"

 doc-title: "Hardware Choices"
 doc-header-links: '
   <link rel="top" title="Apache Cassandra Documentation v3.11.6" href="../index.html"/>
       <link rel="up" title="Operating Cassandra" href="index.html"/>
       <link rel="next" title="Cassandra Tools" href="../tools/index.html"/>
       <link rel="prev" title="Security" href="security.html"/>
 '
 doc-search-path: "../search.html"

 extra-footer: '
 <script type="text/javascript">
     var DOCUMENTATION_OPTIONS = {
       URL_ROOT:    "",
       VERSION:     "",
       COLLAPSE_INDEX: false,
       FILE_SUFFIX: ".html",
       HAS_SOURCE:  false,
       SOURCELINK_SUFFIX: ".txt"
     };
 </script>
 '

 ---
 <div class="container-fluid">
   <div class="row">
     <div class="col-md-3">
       <div class="doc-navigation">
         <div class="doc-menu" role="navigation">
           <div class="navbar-header">
             <button type="button" class="pull-left navbar-toggle" data-toggle="collapse" data-target=".sidebar-navbar-collapse">
               <span class="sr-only">Toggle navigation</span>
               <span class="icon-bar"></span>
               <span class="icon-bar"></span>
               <span class="icon-bar"></span>
             </button>
           </div>
           <div class="navbar-collapse collapse sidebar-navbar-collapse">
             <form id="doc-search-form" class="navbar-form" action="../search.html" method="get" role="search">
               <div class="form-group">
                 <input type="text" size="30" class="form-control input-sm" name="q" placeholder="Search docs">
                 <input type="hidden" name="check_keywords" value="yes" />
                 <input type="hidden" name="area" value="default" />
               </div>
             </form>


             <ul class="current">
 <li class="toctree-l1"><a class="reference internal" href="../getting_started/index.html">Getting Started</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../architecture/index.html">Architecture</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../data_modeling/index.html">Data Modeling</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../cql/index.html">The Cassandra Query Language (CQL)</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../configuration/index.html">Configuring Cassandra</a></li>
 <li class="toctree-l1 current"><a class="reference internal" href="index.html">Operating Cassandra</a><ul class="current">
 <li class="toctree-l2"><a class="reference internal" href="snitch.html">Snitch</a></li>
 <li class="toctree-l2"><a class="reference internal" href="topo_changes.html">Adding, replacing, moving and removing nodes</a></li>
 <li class="toctree-l2"><a class="reference internal" href="repair.html">Repair</a></li>
 <li class="toctree-l2"><a class="reference internal" href="read_repair.html">Read repair</a></li>
 <li class="toctree-l2"><a class="reference internal" href="hints.html">Hints</a></li>
 <li class="toctree-l2"><a class="reference internal" href="compaction.html">Compaction</a></li>
 <li class="toctree-l2"><a class="reference internal" href="bloom_filters.html">Bloom Filters</a></li>
 <li class="toctree-l2"><a class="reference internal" href="compression.html">Compression</a></li>
 <li class="toctree-l2"><a class="reference internal" href="cdc.html">Change Data Capture</a></li>
 <li class="toctree-l2"><a class="reference internal" href="backups.html">Backups</a></li>
 <li class="toctree-l2"><a class="reference internal" href="bulk_loading.html">Bulk Loading</a></li>
 <li class="toctree-l2"><a class="reference internal" href="metrics.html">Monitoring</a></li>
 <li class="toctree-l2"><a class="reference internal" href="security.html">Security</a></li>
 <li class="toctree-l2 current"><a class="current reference internal" href="#">Hardware Choices</a><ul>
 <li class="toctree-l3"><a class="reference internal" href="#cpu">CPU</a></li>
 <li class="toctree-l3"><a class="reference internal" href="#memory">Memory</a></li>
 <li class="toctree-l3"><a class="reference internal" href="#disks">Disks</a></li>
 <li class="toctree-l3"><a class="reference internal" href="#common-cloud-choices">Common Cloud Choices</a></li>
 </ul>
 </li>
 </ul>
 </li>
 <li class="toctree-l1"><a class="reference internal" href="../tools/index.html">Cassandra Tools</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../troubleshooting/index.html">Troubleshooting</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../development/index.html">Cassandra Development</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../faq/index.html">Frequently Asked Questions</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../bugs.html">Reporting Bugs and Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../contactus.html">Contact us</a></li>
 </ul>


           </div><!--/.nav-collapse -->
         </div>
       </div>
     </div>
     <div class="col-md-8">
       <div class="content doc-content">
         <div class="content-container">

   <div class="section" id="hardware-choices">
 <h1>Hardware Choices<a class="headerlink" href="#hardware-choices" title="Permalink to this headline">¶</a></h1>
 <p>Like most databases, Cassandra throughput improves with more CPU cores, more RAM, and faster disks. While Cassandra can
 be made to run on small servers for testing or development environments (including Raspberry Pis), a minimal production
 server requires at least 2 cores, and at least 8GB of RAM. Typical production servers have 8 or more cores and at least
 32GB of RAM.</p>
 <div class="section" id="cpu">
 <h2>CPU<a class="headerlink" href="#cpu" title="Permalink to this headline">¶</a></h2>
 <p>Cassandra is highly concurrent, handling many simultaneous requests (both read and write) using multiple threads running
 on as many CPU cores as possible. The Cassandra write path tends to be heavily optimized (writing to the commitlog and
 then inserting the data into the memtable), so writes, in particular, tend to be CPU bound. Consequently, adding
 additional CPU cores often increases throughput of both reads and writes.</p>
 </div>
 <div class="section" id="memory">
 <h2>Memory<a class="headerlink" href="#memory" title="Permalink to this headline">¶</a></h2>
 <p>Cassandra runs within a Java VM, which will pre-allocate a fixed size heap (java’s Xmx system parameter). In addition to
 the heap, Cassandra will use significant amounts of RAM offheap for compression metadata, bloom filters, row, key, and
 counter caches, and an in process page cache. Finally, Cassandra will take advantage of the operating system’s page
 cache, storing recently accessed portions files in RAM for rapid re-use.</p>
 <p>For optimal performance, operators should benchmark and tune their clusters based on their individual workload. However,
 basic guidelines suggest:</p>
 <ul class="simple">
 <li>ECC RAM should always be used, as Cassandra has few internal safeguards to protect against bit level corruption</li>
 <li>The Cassandra heap should be no less than 2GB, and no more than 50% of your system RAM</li>
 <li>Heaps smaller than 12GB should consider ParNew/ConcurrentMarkSweep garbage collection</li>
 <li>Heaps larger than 12GB should consider G1GC</li>
 </ul>
 </div>
 <div class="section" id="disks">
 <h2>Disks<a class="headerlink" href="#disks" title="Permalink to this headline">¶</a></h2>
 <p>Cassandra persists data to disk for two very different purposes. The first is to the commitlog when a new write is made
 so that it can be replayed after a crash or system shutdown. The second is to the data directory when thresholds are
 exceeded and memtables are flushed to disk as SSTables.</p>
 <p>Commitlogs receive every write made to a Cassandra node and have the potential to block client operations, but they are
 only ever read on node start-up. SSTable (data file) writes on the other hand occur asynchronously, but are read to
 satisfy client look-ups. SSTables are also periodically merged and rewritten in a process called compaction.  The data
 held in the commitlog directory is data that has not been permanently saved to the SSTable data directories - it will be
 periodically purged once it is flushed to the SSTable data files.</p>
 <p>Cassandra performs very well on both spinning hard drives and solid state disks. In both cases, Cassandra’s sorted
 immutable SSTables allow for linear reads, few seeks, and few overwrites, maximizing throughput for HDDs and lifespan of
 SSDs by avoiding write amplification. However, when using spinning disks, it’s important that the commitlog
 (<code class="docutils literal notranslate"><span class="pre">commitlog_directory</span></code>) be on one physical disk (not simply a partition, but a physical disk), and the data files
 (<code class="docutils literal notranslate"><span class="pre">data_file_directories</span></code>) be set to a separate physical disk. By separating the commitlog from the data directory,
 writes can benefit from sequential appends to the commitlog without having to seek around the platter as reads request
 data from various SSTables on disk.</p>
 <p>In most cases, Cassandra is designed to provide redundancy via multiple independent, inexpensive servers. For this
 reason, using NFS or a SAN for data directories is an antipattern and should typically be avoided.  Similarly, servers
 with multiple disks are often better served by using RAID0 or JBOD than RAID1 or RAID5 - replication provided by
 Cassandra obsoletes the need for replication at the disk layer, so it’s typically recommended that operators take
 advantage of the additional throughput of RAID0 rather than protecting against failures with RAID1 or RAID5.</p>
 </div>
 <div class="section" id="common-cloud-choices">
 <h2>Common Cloud Choices<a class="headerlink" href="#common-cloud-choices" title="Permalink to this headline">¶</a></h2>
 <p>Many large users of Cassandra run in various clouds, including AWS, Azure, and GCE - Cassandra will happily run in any
 of these environments. Users should choose similar hardware to what would be needed in physical space. In EC2, popular
 options include:</p>
 <ul class="simple">
 <li>m1.xlarge instances, which provide 1.6TB of local ephemeral spinning storage and sufficient RAM to run moderate
 workloads</li>
 <li>i2 instances, which provide both a high RAM:CPU ratio and local ephemeral SSDs</li>
 <li>m4.2xlarge / c4.4xlarge instances, which provide modern CPUs, enhanced networking and work well with EBS GP2 (SSD)
 storage</li>
 </ul>
 <p>Generally, disk and network performance increases with instance size and generation, so newer generations of instances
 and larger instance types within each family often perform better than their smaller or older alternatives.</p>
 </div>
 </div>


           <div class="doc-prev-next-links" role="navigation" aria-label="footer navigation">

             <a href="../tools/index.html" class="btn btn-default pull-right " role="button" title="Cassandra Tools" accesskey="n">Next <span class="glyphicon glyphicon-circle-arrow-right" aria-hidden="true"></span></a>


             <a href="security.html" class="btn btn-default" role="button" title="Security" accesskey="p"><span class="glyphicon glyphicon-circle-arrow-left" aria-hidden="true"></span> Previous</a>

           </div>

         </div>
       </div>
     </div>
   </div>
 </div>
	---
	layout: docpage

	title: "Documentation"

	is_homepage: false
	is_sphinx_doc: true

	doc-parent: "Operating Cassandra"

	doc-title: "Hardware Choices"
	doc-header-links: '
	<link rel="top" title="Apache Cassandra Documentation v3.11.6" href="../index.html"/>
	<link rel="up" title="Operating Cassandra" href="index.html"/>
	<link rel="next" title="Cassandra Tools" href="../tools/index.html"/>
	<link rel="prev" title="Security" href="security.html"/>
	'
	doc-search-path: "../search.html"

	extra-footer: '
	<script type="text/javascript">
	var DOCUMENTATION_OPTIONS = {
	URL_ROOT: "",
	VERSION: "",
	COLLAPSE_INDEX: false,
	FILE_SUFFIX: ".html",
	HAS_SOURCE: false,
	SOURCELINK_SUFFIX: ".txt"
	};
	</script>
	'

	---
	<div class="container-fluid">
	<div class="row">
	<div class="col-md-3">
	<div class="doc-navigation">
	<div class="doc-menu" role="navigation">
	<div class="navbar-header">
	<button type="button" class="pull-left navbar-toggle" data-toggle="collapse" data-target=".sidebar-navbar-collapse">
	<span class="sr-only">Toggle navigation</span>
	<span class="icon-bar"></span>
	<span class="icon-bar"></span>
	<span class="icon-bar"></span>
	</button>
	</div>
	<div class="navbar-collapse collapse sidebar-navbar-collapse">
	<form id="doc-search-form" class="navbar-form" action="../search.html" method="get" role="search">
	<div class="form-group">
	<input type="text" size="30" class="form-control input-sm" name="q" placeholder="Search docs">
	<input type="hidden" name="check_keywords" value="yes" />
	<input type="hidden" name="area" value="default" />
	</div>
	</form>



	<ul class="current">
	<li class="toctree-l1"><a class="reference internal" href="../getting_started/index.html">Getting Started</a></li>
	<li class="toctree-l1"><a class="reference internal" href="../architecture/index.html">Architecture</a></li>
	<li class="toctree-l1"><a class="reference internal" href="../data_modeling/index.html">Data Modeling</a></li>
	<li class="toctree-l1"><a class="reference internal" href="../cql/index.html">The Cassandra Query Language (CQL)</a></li>
	<li class="toctree-l1"><a class="reference internal" href="../configuration/index.html">Configuring Cassandra</a></li>
	<li class="toctree-l1 current"><a class="reference internal" href="index.html">Operating Cassandra</a><ul class="current">
	<li class="toctree-l2"><a class="reference internal" href="snitch.html">Snitch</a></li>
	<li class="toctree-l2"><a class="reference internal" href="topo_changes.html">Adding, replacing, moving and removing nodes</a></li>
	<li class="toctree-l2"><a class="reference internal" href="repair.html">Repair</a></li>
	<li class="toctree-l2"><a class="reference internal" href="read_repair.html">Read repair</a></li>
	<li class="toctree-l2"><a class="reference internal" href="hints.html">Hints</a></li>
	<li class="toctree-l2"><a class="reference internal" href="compaction.html">Compaction</a></li>
	<li class="toctree-l2"><a class="reference internal" href="bloom_filters.html">Bloom Filters</a></li>
	<li class="toctree-l2"><a class="reference internal" href="compression.html">Compression</a></li>
	<li class="toctree-l2"><a class="reference internal" href="cdc.html">Change Data Capture</a></li>
	<li class="toctree-l2"><a class="reference internal" href="backups.html">Backups</a></li>
	<li class="toctree-l2"><a class="reference internal" href="bulk_loading.html">Bulk Loading</a></li>
	<li class="toctree-l2"><a class="reference internal" href="metrics.html">Monitoring</a></li>
	<li class="toctree-l2"><a class="reference internal" href="security.html">Security</a></li>
	<li class="toctree-l2 current"><a class="current reference internal" href="#">Hardware Choices</a><ul>
	<li class="toctree-l3"><a class="reference internal" href="#cpu">CPU</a></li>
	<li class="toctree-l3"><a class="reference internal" href="#memory">Memory</a></li>
	<li class="toctree-l3"><a class="reference internal" href="#disks">Disks</a></li>
	<li class="toctree-l3"><a class="reference internal" href="#common-cloud-choices">Common Cloud Choices</a></li>
	</ul>
	</li>
	</ul>
	</li>
	<li class="toctree-l1"><a class="reference internal" href="../tools/index.html">Cassandra Tools</a></li>
	<li class="toctree-l1"><a class="reference internal" href="../troubleshooting/index.html">Troubleshooting</a></li>
	<li class="toctree-l1"><a class="reference internal" href="../development/index.html">Cassandra Development</a></li>
	<li class="toctree-l1"><a class="reference internal" href="../faq/index.html">Frequently Asked Questions</a></li>
	<li class="toctree-l1"><a class="reference internal" href="../bugs.html">Reporting Bugs and Contributing</a></li>
	<li class="toctree-l1"><a class="reference internal" href="../contactus.html">Contact us</a></li>
	</ul>



	</div><!--/.nav-collapse -->
	</div>
	</div>
	</div>
	<div class="col-md-8">
	<div class="content doc-content">
	<div class="content-container">

	<div class="section" id="hardware-choices">
	<h1>Hardware Choices<a class="headerlink" href="#hardware-choices" title="Permalink to this headline">¶</a></h1>
	<p>Like most databases, Cassandra throughput improves with more CPU cores, more RAM, and faster disks. While Cassandra can
	be made to run on small servers for testing or development environments (including Raspberry Pis), a minimal production
	server requires at least 2 cores, and at least 8GB of RAM. Typical production servers have 8 or more cores and at least
	32GB of RAM.</p>
	<div class="section" id="cpu">
	<h2>CPU<a class="headerlink" href="#cpu" title="Permalink to this headline">¶</a></h2>
	<p>Cassandra is highly concurrent, handling many simultaneous requests (both read and write) using multiple threads running
	on as many CPU cores as possible. The Cassandra write path tends to be heavily optimized (writing to the commitlog and
	then inserting the data into the memtable), so writes, in particular, tend to be CPU bound. Consequently, adding
	additional CPU cores often increases throughput of both reads and writes.</p>
	</div>
	<div class="section" id="memory">
	<h2>Memory<a class="headerlink" href="#memory" title="Permalink to this headline">¶</a></h2>
	<p>Cassandra runs within a Java VM, which will pre-allocate a fixed size heap (java’s Xmx system parameter). In addition to
	the heap, Cassandra will use significant amounts of RAM offheap for compression metadata, bloom filters, row, key, and
	counter caches, and an in process page cache. Finally, Cassandra will take advantage of the operating system’s page
	cache, storing recently accessed portions files in RAM for rapid re-use.</p>
	<p>For optimal performance, operators should benchmark and tune their clusters based on their individual workload. However,
	basic guidelines suggest:</p>
	<ul class="simple">
	<li>ECC RAM should always be used, as Cassandra has few internal safeguards to protect against bit level corruption</li>
	<li>The Cassandra heap should be no less than 2GB, and no more than 50% of your system RAM</li>
	<li>Heaps smaller than 12GB should consider ParNew/ConcurrentMarkSweep garbage collection</li>
	<li>Heaps larger than 12GB should consider G1GC</li>
	</ul>
	</div>
	<div class="section" id="disks">
	<h2>Disks<a class="headerlink" href="#disks" title="Permalink to this headline">¶</a></h2>
	<p>Cassandra persists data to disk for two very different purposes. The first is to the commitlog when a new write is made
	so that it can be replayed after a crash or system shutdown. The second is to the data directory when thresholds are
	exceeded and memtables are flushed to disk as SSTables.</p>
	<p>Commitlogs receive every write made to a Cassandra node and have the potential to block client operations, but they are
	only ever read on node start-up. SSTable (data file) writes on the other hand occur asynchronously, but are read to
	satisfy client look-ups. SSTables are also periodically merged and rewritten in a process called compaction. The data
	held in the commitlog directory is data that has not been permanently saved to the SSTable data directories - it will be
	periodically purged once it is flushed to the SSTable data files.</p>
	<p>Cassandra performs very well on both spinning hard drives and solid state disks. In both cases, Cassandra’s sorted
	immutable SSTables allow for linear reads, few seeks, and few overwrites, maximizing throughput for HDDs and lifespan of
	SSDs by avoiding write amplification. However, when using spinning disks, it’s important that the commitlog
	(<code class="docutils literal notranslate"><span class="pre">commitlog_directory</span></code>) be on one physical disk (not simply a partition, but a physical disk), and the data files
	(<code class="docutils literal notranslate"><span class="pre">data_file_directories</span></code>) be set to a separate physical disk. By separating the commitlog from the data directory,
	writes can benefit from sequential appends to the commitlog without having to seek around the platter as reads request
	data from various SSTables on disk.</p>
	<p>In most cases, Cassandra is designed to provide redundancy via multiple independent, inexpensive servers. For this
	reason, using NFS or a SAN for data directories is an antipattern and should typically be avoided. Similarly, servers
	with multiple disks are often better served by using RAID0 or JBOD than RAID1 or RAID5 - replication provided by
	Cassandra obsoletes the need for replication at the disk layer, so it’s typically recommended that operators take
	advantage of the additional throughput of RAID0 rather than protecting against failures with RAID1 or RAID5.</p>
	</div>
	<div class="section" id="common-cloud-choices">
	<h2>Common Cloud Choices<a class="headerlink" href="#common-cloud-choices" title="Permalink to this headline">¶</a></h2>
	<p>Many large users of Cassandra run in various clouds, including AWS, Azure, and GCE - Cassandra will happily run in any
	of these environments. Users should choose similar hardware to what would be needed in physical space. In EC2, popular
	options include:</p>
	<ul class="simple">
	<li>m1.xlarge instances, which provide 1.6TB of local ephemeral spinning storage and sufficient RAM to run moderate
	workloads</li>
	<li>i2 instances, which provide both a high RAM:CPU ratio and local ephemeral SSDs</li>
	<li>m4.2xlarge / c4.4xlarge instances, which provide modern CPUs, enhanced networking and work well with EBS GP2 (SSD)
	storage</li>
	</ul>
	<p>Generally, disk and network performance increases with instance size and generation, so newer generations of instances
	and larger instance types within each family often perform better than their smaller or older alternatives.</p>
	</div>
	</div>




	<div class="doc-prev-next-links" role="navigation" aria-label="footer navigation">

	<a href="../tools/index.html" class="btn btn-default pull-right " role="button" title="Cassandra Tools" accesskey="n">Next <span class="glyphicon glyphicon-circle-arrow-right" aria-hidden="true"></span></a>


	<a href="security.html" class="btn btn-default" role="button" title="Security" accesskey="p"><span class="glyphicon glyphicon-circle-arrow-left" aria-hidden="true"></span> Previous</a>

	</div>

	</div>
	</div>
	</div>
	</div>
	</div>