| = Hardware Choices |
| |
| Like most databases, Cassandra throughput improves with more CPU cores, |
| more RAM, and faster disks. While Cassandra can be made to run on small |
| servers for testing or development environments (including Raspberry |
| Pis), a minimal production server requires at least 2 cores, and at |
| least 8GB of RAM. Typical production servers have 8 or more cores and at |
| least 32GB of RAM. |
| |
| == CPU |
| |
| Cassandra is highly concurrent, handling many simultaneous requests |
| (both read and write) using multiple threads running on as many CPU |
| cores as possible. The Cassandra write path tends to be heavily |
| optimized (writing to the commitlog and then inserting the data into the |
| memtable), so writes, in particular, tend to be CPU bound. Consequently, |
| adding additional CPU cores often increases throughput of both reads and |
| writes. |
| |
| == Memory |
| |
| Cassandra runs within a Java VM, which will pre-allocate a fixed size |
| heap (java's Xmx system parameter). In addition to the heap, Cassandra |
| will use significant amounts of RAM offheap for compression metadata, |
| bloom filters, row, key, and counter caches, and an in process page |
| cache. Finally, Cassandra will take advantage of the operating system's |
| page cache, storing recently accessed portions files in RAM for rapid |
| re-use. |
| |
| For optimal performance, operators should benchmark and tune their |
| clusters based on their individual workload. However, basic guidelines |
| suggest: |
| |
| * ECC RAM should always be used, as Cassandra has few internal |
| safeguards to protect against bit level corruption |
| * The Cassandra heap should be no less than 2GB, and no more than 50% of |
| your system RAM |
| * Heaps smaller than 12GB should consider ParNew/ConcurrentMarkSweep |
| garbage collection |
| * Heaps larger than 12GB should consider either: |
| ** 16GB heap with 8-10GB of new gen, a survivor ratio of 4-6, and a maximum |
| tenuring threshold of 6 |
| ** G1GC |
| |
| == Disks |
| |
| Cassandra persists data to disk for two very different purposes. The |
| first is to the commitlog when a new write is made so that it can be |
| replayed after a crash or system shutdown. The second is to the data |
| directory when thresholds are exceeded and memtables are flushed to disk |
| as SSTables. |
| |
| Commitlogs receive every write made to a Cassandra node and have the |
| potential to block client operations, but they are only ever read on |
| node start-up. SSTable (data file) writes on the other hand occur |
| asynchronously, but are read to satisfy client look-ups. SSTables are |
| also periodically merged and rewritten in a process called compaction. |
| The data held in the commitlog directory is data that has not been |
| permanently saved to the SSTable data directories - it will be |
| periodically purged once it is flushed to the SSTable data files. |
| |
| Cassandra performs very well on both spinning hard drives and solid |
| state disks. In both cases, Cassandra's sorted immutable SSTables allow |
| for linear reads, few seeks, and few overwrites, maximizing throughput |
| for HDDs and lifespan of SSDs by avoiding write amplification. However, |
| when using spinning disks, it's important that the commitlog |
| (`commitlog_directory`) be on one physical disk (not simply a partition, |
| but a physical disk), and the data files (`data_file_directories`) be |
| set to a separate physical disk. By separating the commitlog from the |
| data directory, writes can benefit from sequential appends to the |
| commitlog without having to seek around the platter as reads request |
| data from various SSTables on disk. |
| |
| In most cases, Cassandra is designed to provide redundancy via multiple |
| independent, inexpensive servers. For this reason, using NFS or a SAN |
| for data directories is an antipattern and should typically be avoided. |
| Similarly, servers with multiple disks are often better served by using |
| RAID0 or JBOD than RAID1 or RAID5 - replication provided by Cassandra |
| obsoletes the need for replication at the disk layer, so it's typically |
| recommended that operators take advantage of the additional throughput |
| of RAID0 rather than protecting against failures with RAID1 or RAID5. |
| |
| == Common Cloud Choices |
| |
| Many large users of Cassandra run in various clouds, including AWS, |
| Azure, and GCE - Cassandra will happily run in any of these |
| environments. Users should choose similar hardware to what would be |
| needed in physical space. In EC2, popular options include: |
| |
| * i2 instances, which provide both a high RAM:CPU ratio and local |
| ephemeral SSDs |
| * i3 instances with NVMe disks |
| ** EBS works okay if you want easy backups and replacements |
| * m4.2xlarge / c4.4xlarge instances, which provide modern CPUs, enhanced |
| networking and work well with EBS GP2 (SSD) storage |
| |
| Generally, disk and network performance increases with instance size and |
| generation, so newer generations of instances and larger instance types |
| within each family often perform better than their smaller or older |
| alternatives. |